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•JThe  fundamental  efficiency  of  adaptive  algorithms  is  analyzed.  It  is 
found  that  noise  in  the  adaptive  weights  increases  with  convergence  speed. 
This  causes  loss  in  mean-square-error  performance.  Efficiency  is  con¬ 
sidered  from  the  point  of  view  of  misadjustment  versus  speed  of  conver¬ 
gence.  A  new  version  of  the  U*S  algorithm  based  on  Newton's  method  is 
anlayzed  and  shown  to  make  maximally  efficient  use  of  real-time  input 
data.  The  performance  of  this  algorithm  is  not  affected  by  eigenvalue 
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disparity.  Practical  algorithms  can  be  devised  that  closely  approximate 
Newton's  method.  In  certain  cases,  the  steepest  descent  version  of  IMS 
performs  as  well  as  Newton's  method. 


The  efficieny  of  adaptive  algorithms  with  nonstationary  input  environ)- 
ments  is  analyzed  where  signals,  jammers,  and  background  noises  can  be  of 
a  transient  and  nonstationary  nature.  Such  environments  have  been  modeled| 
in  terms'of  moving  paraboloidal  mean- square- error  performance  functions. 

The  bottom  of  the  performance  bowl  can  be  assumed  to  move  slowly,  with  a 
randomly  correlated  Markov  character.  Exponential  time  weighting,  inher¬ 
ent  in  the  IMS  algorithm,  can  give  optimal  performance  with  the  proper 
choice  of  the  parameter  u  when  the  motion  of  the  bottom  of  the  bowl  is 
first  order  Markov.  With  higher-order  Markov  activity,  exponential  time 
weighting  is  no  longer  optimal.  Higher-order  adaptive  algorithms  are 
devised  for  nonstationary  input  applications. 

- y”  A  new  adaptive  filtering  method  for  broadband  adaptive  beamforming 

is  described  which  uses  both  poles  and  zeros  in  the  adaptive  signal  . 

•  filtering  paths  from  the  antenna  elements  to  the  final  array  output .  >~When| 
directly  adapting  feedback  coefficients,  difficulties  arise  as  the  mean- 
square-error  is  not  a  quadratic  function  of  the  weights  but  is,  in  fact, 
multimodal.  There  are  questions  of  process  instability,  convergence  to 
local  rather  than  global  optima,  and  generally  slow  convergence  that  occur)* 
with  all  of  the  known  algorithms  that  have  been  proposed  for  adapting  feed 
back  filters.  However,  it  is  possible  to  simultaneously  adapt  feedforward 
and  feedback  filter  coefficients  by  adapting  feedforward  filters  only.  The 
advantage  of  this  approach  comes  from  the  quadratic  nature  of  the  mean- 
square-error  function.  This  method  does  not  have  problems  of  instability 
and  convergence  to  local  optima,  and  it  converges  essentially  as  fast  as 
the  conventional  zeros-only  system.  It  has  the  disadvantage  of  not  mini¬ 
mizing  the  true  mean-square-error,  but  does  minimize  the  mean  square  of  a 
filtered  version  of  the  true  error.  The  effects  of  not  minimizing  true 
mean-square-error  are  being  analyzed.  Some  simulation  experiments  have 
been  conducted  to  compare  the  new  approach  with  conventional  beamformers 
using  an  equal  number  of  variable  weights.  Preliminary  results  show  deepef 
and  sharper  nulling  with  the  new  approach  than  with  the  conventional 
approach. 
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Abstract 

This  annual  interim  report  is  organized  in  three  parts: 

Part  1 

The  fundamental  efficiency  of  adaptive  algorithms  is  analyzed.  It  is  found 
that  noise  in  the  adaptive  weights  increases  with  convergence  speed.  This 
causes  loss  in  mean-square-error  performance.  Efficiency  is  considered  from 
the  point  of  view  of  misadjustment  versus  speed  of  convergence.  A  new  version 
of  the  LMS  algorithm  based  on  Newton's  method  is  analyzed  and  shown  to  make 
as  efficient  use  of  real-time  input  data  as  can  be.  The  performance  of  this  algo¬ 
rithm  is  not  affected  by  eigenvalue  disparity.  Practical  algorithms  can  be  dev¬ 
ised  that  closely  approximate  Newton’s  method.  In  certain  cases,  the  steepest 
descent  version  of  LMS  performs  as  well  as  Newton's  method. 

Part  n 

This  part  analyzes  the  efficiency  of  adaptive  algorithms  with  nonstationary 
input  environments,  i.e  signals,  jammers,  and  background  noises  can  be  of  a 
transient  and  nonstationary  nature.  Such  environments  have  been  modeled  in 
terms  of  moving  paraboloidal  mean-square-error  performance  functions.  The 
bottom  of  the  performance  bowl  can  be  assumed  to  move  slowly,  with  a  ran¬ 
domly  correlated  Markov  character.  Exponential  time  weighting,  inherent  in  the 
LMS  algorithm,  can  give  optimal  performance  with  the  proper  choice  of  the 
parameter  n  when  the  motion  of  the  bottom  of  the  bowl  is  first  order  Markov. 
With  higher-order  Markov  activity,  exponential  time  weighting  is  no  longer 
optimal.  Higher  order  adaptive  algorithms  are  being  devised  for  nonstationary 
input  applications. 
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Part  111 

This  part  introduces  a  new  adaptive  filtering  method  for  broadband  adap¬ 
tive  beamforming.  It  uses  both  poles  and  zeros  in  the  adaptive  signal  filtering 
paths  from  the  antenna  elements  to  the  final  array  output. 

When  directly  adapting  feedback  coefficients,  difficulties  arise  as  the  mean 
square  error  is  not  a  quadratic  function  of  the  weights  but  is,  in  fact,  multimo¬ 
dal.  There  are  questions  of  process  instability,  hang  up  on  local  rather  than  glo¬ 
bal  optima,  and  generally  slow  convergence  that  occurs  with  all  of  the  known 
algorithms  that  have  been  proposed  for  adapting  feedback  filters. 

Our  methodology  effectively  permits  simultaneous  adaptation  of  feedfor¬ 
ward  and  feedback  filter  coefficients  by  adapting  feedforward  filters  only  The 
advantage  of  this  approach  comes  from  the  quadratic  nature  of  the  mean  square 
error  function.  Our  method  does  not  have  problems  of  instability  and  conver¬ 
gence  to  local  optima,  and  it  converges  essentially  as  fast  as  the  conventional 
zeros-only  system.  It  has  the  disadvantage  of  not  minimizing  the  true  mean- 
square-error.  but  does  minimize  the  mean  square  of  a  filtered  version  of  the 
true  error.  The  effects  of  not  minimizing  true  mean  square  error  are  being 
analyzed. 

Preliminary  experimental  results  comparing  systems  with  equal  numbers  of 
variable  weights,  show  deeper  and  sharper  jammer  nulling  than  would  be  possi¬ 
ble  with  the  conventional  approach. 
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PARTI 


ADAPTIVE  ALGORITHM  EFFICIENCY  i 

i 

i 

i 
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1.1  INTRODUCTION 


The  first  part  of  this  report  deals  with  the  efficiency  of  adaptive  algorithms, 
and  suggests  a  way  to  improve  on  the  current  techniques.  We  include  a  general 
discussion  of  the  issues  involved  in  evaluating  and  comparing  various  adaptive 
algorithms.  Also,  an  optimal  form  of  adaptive  algorithm  is  introduced,  called 
the  LMS/Newton  algorithm.  This  is  a  stochastic  gradient  descent  algorithm 
based  on  Newton's  Method,  with  a  statistical  performance  which  is  as  efficient  in 
its  data  usage  as  possible.  Eigenvalue  spread  does  not  affect  the  rate  of  conver¬ 
gence  of  LMS/Newton.  The  standard  LMS  algorithm  and  LMS/Newton  algorithm 
are  compared,  and  conditions  are  established  when  the  two  algorithms  have  the 
same  average  performance. 
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1.2  ALGORITHM  EFFICIENCY 

Two  forms  of  the  LMS  adaptive  algorithm  will  be  discussed  here,  the  "usual" 
algorithm  based  on  the  method  of  steepest  descent  [  1-5],  and  an  idealized  algo¬ 
rithm  based  on  Newton’s  method.  These  algorithms  will  be  considered  from  the 
points  of  view  of:  (a)  rate  of  convergence,  (b)  efficiency  of  statistical  perfor¬ 
mance. 

Speeding  up  a  given  adaptive  process  generally  requires  that  the  adaptive 
parameters  (weights,  etc.)  take  values  based  on  averaging  over  less  input  data. 
The  result  is  increased  parameter  noise  and  reduced  average  system  perfor¬ 
mance.  When  using  a  specific  algorithm,  there  is  generally  a  tradeoff  between 
speed  of  convergence  and  average  statistical  performance. 

Two  algorithms  may  be  compared  with  each  other  when  applied  to  the  same 
adaptation  task  by  adjusting  their  rates  of  convergence  to  cause  the  same 
effective  parameter  noise.  As  such,  the  more  efficient  algorithm  converges  fas¬ 
ter.  Effective  parameter  noise  is  that  attribute  of  the  noise  that  causes  loss  in 
system  performance. 

The  steepest  descent  version  of  the  LMS  algorithm  is 

W}¥1  =  Wj  +  2 fitjXj  .  (1.1) 

The  p01  mode  of  the  mean  square  error  learning  curve  has  a  time  constant  given 
by 


It  is  seen  that  increasing  the  convergence  factor  fi  speeds  up  the  adaptive  pro¬ 
cess  by  causing  the  adaptive  time  constants  to  be  reduced.  However,  to  insure 
stability  in  the  mean,  n  must  be  kept  within  the  bounds 

r-—  >  fl  >  0 

Anuj< 


(1-3) 
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where  Xm**  is  the  largest  eigenvalue  of  the  input  correlation  matrix  R.  After 
adaptive  transients  die  out,  noise  in  the  weights  causes,  on  the  average,  an 
increase  in  mean  square  error  over  the  theoretical  minimum  mean  square 
error.  The  misadjustment  M  has  been  defined  [1-5]  as  the  dimensionless  ratio  of 
the  average  excess  mean  square  error  to  the  minimum  mean  square  error.  For 
the  steepest  descent  LMS  algorithm, 


M  -  fj.  trace  R  = 


n 

i 

4 

(1.4) 


Increasing  fi  speeds  up  the  adaptive  process  but  increases  the  misadjustment. 

A  "Newton's  method"  version  of  the  LMS  algorithm  premultiplies  the  instan¬ 
taneous  gradient  estimate  2 SjXj  by  the  inverse  of  R.  The  algorithm  is 

Wj+i  =  Wt  +  2Mo *jR-%  ■  (15) 

The  scaling  constant  (the  average  of  the  eigenvalues)  has  been  included  for 
convenience.  It  can  be  shown  that  premultiplication  by  /f_1  causes  each  adap¬ 
tive  step  to  be  taken  not  along  the  maximum  gradient  but  instead  in  the  direc¬ 
tion  toward  the  bottom  of  a  quadratic  bowl.  The  effect  is  very  much  like  applica¬ 
tion  or  steepest  descent  when  all  eigenvalues  are  equal.  The  eccentricity  of  the 
performance  function  is  eliminated  by  Newton's  method  as  specified  by  (15). 
Newton's  method  requires  R~l  which  is  generally  not  available.  An  attempt  to 
perform  an  algorithm  like  equation  (1.5),  only  using  an  R"1  estimated  from  input 
data,  has  been  reported  by  Griffiths  and  Mantey  [6]  and  is  summarized  in  sec¬ 
tion  2.2  of  this  report. 

For  now,  we  shall  focus  our  attention  on  equation  (1.5),  realizing  that  such 
an  algorithm,  a  true  Newton's  method  version  of  LMS,  is  a  mathematical  idealiza¬ 
tion.  It  can  be  shown  to  have  the  following  properties.  Instead  of  there  being  a 
number  of  time  constants  of  the  mean  square  error  learning  curve  equal  to  the 
number  of  weights  (as  with  conventional  LMS),  there  is  a  single  time  constant 
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given  by 


Tm"  '  4 pKo.  ' 


The  misadjustment  of  algorithm  (1.5)  is  given  by 


M  =  fj.  trace  R  - 


n 


The  bounds  on  /i  for  convergence  in  the  mean  are 

1 


->  (j.>  0  . 


(1.6) 


(1.7) 


(IB) 


Comparing  the  two  algorithms,  we  make  their  fj.  values  equal  in  order  to 
have  equal  misadjustments.  Immediately  we  see  that  the  stable  range  of  (J.  for 
steepest  descent  is  smaller  than  for  Newton’s  method  when  there  is  eigenvalue 
disparity.  Since  these  algorithms  are  generally  operated  with  small  /jl  to  main¬ 
tain  small  M,  this  is  not  necessarily  disadvantageous  for  steepest  descent.  How¬ 
ever.  when  the  eigenvalue  spread  is  extreme,  steepest  descent  piay  be  forced  to 
operate  with  a  very  small  value  of  p.  in  order  to  maintain  stability.  Under  such 
circumstances,  steepest  descent  would  be  stability  bound  rather  than  misad¬ 
justment  bound. 

With  equal  settings  of  /*,  both  algorithms  have  the  same  misadjustment. 
Under  these  circumstances,  it  is  interesting  to  compare  the  Newton’s-method 
time  constant  (1.6)  with  the  steepest-descent  time  constants  (1.3).  It  is  clear 
that  some  of  the  steepest  descent  convergence  modes  are  going  to  be  faster 
while  some  are  going  to  be  slower  than  the  single  Newton’s  method  mode.  If  we 
compare  areas  under  the  learning  curves  in  order  to  compare  "learning  times" 
of  single  mode  exponential  curves  with  multimode  curves  (as  is  done  in  Fig.  l.l), 
it  can  be  shown  that  when  misadjustment  bound,  the  learning  time  of  steepest 
descent  averaged  over  random  initial  conditions  is  identical  to  the  learning  time 
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MEVrrON:*t'MSE  -  100  jti  kations 

S-D:  FAST  't'MSE  ■  5^  ITERATIONS  lx  «2.9(10>‘* 
SL0W^mse  -  262  ITERATIONS 


Fig.  1.1.  Learning  curves  for  Newton's  method  and 
the  method  of  steepest  descent  (S-D). 
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of  Newton's  method.  However,  one  should  realize  that  the  worst  case  learning 
time  for  steepest  descent  will  be  worse  than  that  for  Newton's  method  by  a  fac¬ 
tor  of  \mex/  X™ 

The  behavior  of  the  steepest  descent  LMS  algorithm  has  been  analyzed  in 
detail  in  [4]  with  a  simple  form  of  nonstationary  input  that  results  in  the  qua¬ 
dratic  mean  square  error  function  undergoing  a  random  vector  displacement. 
The  motion  of  the  bottom  of  the  bowl  is  first  order  Markov.  Misadjustment 
results  both  from  noise  in  the  weights  and  from  the  weights  dynamically  lagging 
behind  the  bottom  of  the  moving  mean  square  error  bowl.  It  is  shown  that  the 
total  misadjustment  is  minimized  when  the  rate  of  adaptation  is  adjusted  (by 
choice  of  /x)  so  that  both  components  of  misadjustment  are  equal.  A  similar 
analysis  has  been  made  for  LMS  Newton,  and  it  has  been  found  that  the  value  or 
/x  that  optimizes  steepest  descent  also  optimizes  Newton  s  method  and  that  both 
algorithms  yield  the  same  misadjustment  for  the  same  /x.  The  conclusion  is  that 
if  the  steepest  descent  algorithm  is  misadjustment  bound  rather  than  stability 
bound,  the  conventional  steepest  descent  approach  gives  identical  performance 
in  a  statistical  sense  to  Newton's  method  with  simple  nonstationary  inputs. 

The  Newton's  method  version  of  the  LMS  algorithm  is  about  as  effecient  as 
an  algorithm  can  be,  from  the  standpoint  of  statistical  performance.  For  a  given 
number  of  weights  and  for  a  given  level  of  misadjustment,  the  number  of  data 
samples  seen  and  consumed  in  the  convergence  process  of  LMS  Newton  is  about, 
as  small  as  nature  will  permit.  Justification  for  this  comes  from  study  of  adap¬ 
tive  behavior  when  learning  with  a  finite  number  of  data  samples. 

It  is  shown  in  Appendix  A  of  reference  [4]  that  when  training  an  n-weight 
adaptive  system  with  N  independent  data  vectors,  the  expected  misadjustment 
is 

jy  _  n__  number  of  weights _  ^  0^ 

N  number  of  training  samples  ' 


This  result  was  first  reported  without  a  complete  proof  in  [l].  It  is  independent 
of  algorithm,  as  long  as  the  algorithm  is  least  squares.  A  few  simplifying 
assumptions  were  made  to  arrive  at  such  an  elegant  and  simple  result.  This  for¬ 
mula  has  been  tested  extensively  by  computer  simulation  and  has  been  found  to 
be  quite  accurate  for  misadjustments  of  25%  or  less. 

Misadjustment  formula  (1.9)  may  be  compared  with  that  for  LMS  Newton 
(1.7).  The  comparison  cannot  be  exact  however,  because  (1.9)  applies  to  learn¬ 
ing  with  a  finite  block  of  data  while  (1.7)  applies  to  a  steady  flow  learning  pro¬ 
cess.  Actually,  (1.9)  applies  to  steady  flow  learning  with  a  uniform  moving- 
average  window  while  (1.7)  applies  to  steady  flow  learning  with  an  exponential 
moving  window.  Reconsider  equation  (1.7).  It  can  be  written  as 


(1.10) 


One  can  read  this  as  "misadjustment  of  LMS  Newton  equals  the  number  of 
weights  divided  by  the  number  of  training  samples,"  if  one  considers  that  an 
exponential  process  essentially  settles  within  four  time  constants  and  that  any 
input  that  has  occured  more  than  four  time  constants  ago  would  have  negligible 
effect  on  the  weights.  We  conclude  that  LMS  Newton  has  the  misadjustment  of  a 
fundamental  least  squares  process.  No  adaptive  process  could  have  a  lower 
misadjustment  than  (1.9)  or  (1.10).  No  more  "information"  (in  the  common 
English  sense)  can  be  squeezed  from  a  given  amount  of  data. 

We  now  know  that  the  Newton's  method  version  of  LMS  is  as  fine  and 
efficient  an  adaptive  algorithm  as  can  be.  Unfortunately  it  cannot  be  imple¬ 
mented  unless  one  perfectly  knows  R~x.  Without  such  knowledge,  one  can  only 
approximate  LMS  Newton.  In  application  to  adaptive  FIR  digital  filters,  perhaps 
this  or  something  approximating  this  is  achieved  by  the  adaptive  lattice  filter 
algorithms  that  have  appeared  during  the  last  half  dozen  years  or  so  Whether 


or  not  this  is  true  remains  to  be  seen. 


With  extreme  eigenvalue  disparity,  stability  may  be  of  limiting  concern 
rather  than  misadjustment.  To  achieve  maximum  convergence  speed  with  LMS 
steepest  descent,  fj,  would  be  set  to  1/  causing  the  slowest  mode  to  have  a 
time  constant  of  V**/  4Xmlri  adaptations.  Operating  steepest  descent  at  full 
speed,  the  misadjustment  would  be  U  =  traceR/\lntx>  1.  LMS  Newton,  doing 
the  same  job.  could  be  pushed  much  faster.  Maximum  speed  would  be  achieved 
by  setting  p.  to  half  of  its  upper  stable  limit,  i.e.  p  =  1/  2^^, .  giving  theoretical 
(no  gradient  noise)  convergence  in  one  iteration.  As  such,  its  misadjustment 
would  be  M  =  n/2.  A  100  weight  filter,  for  example,  would  have  a  misadjustment 
of  50. 

In  most  engineering  applications,  a  misadjustment  of  100%  would  be  con¬ 
sidered  very  high.  The  adaptive  solution  then  gives  twice  as  much  mean  square 
error  as  the  Wiener  solution,  i.e.  3  dB  more  mean  square  error.  Only  when  the 
minimum  mean  square  error  of  the  Wiener  solution  is  zero  or  very  small  would 
high  misadjustment  be  acceptable.  Recall  that  misadjustment  is  normalized 
relative  to  minimum  mean  square  error. 

In  most  engineering  applications,  a  misadjustment  of  about  10%  would  be 
satisfactory.  As  such,  neither  LMS/steepest  descent  nor  LMS/Newton  would  be 
pushed  anywhere  near  to  the  brink  of  instability.  Speed  of  convergence  would 
then  be  misadjustment  limited  rather  than  stability  limited,  regardless  of  eigen¬ 
value  spread. 

It  is  safe  to  say  that  the  steepest  descent  version  of  the  LMS  algorithm  is 
the  simplest,  most  widely  used,  and  most  widely  understood  of  all  adaptive  algo¬ 
rithms.  With  all  eigenvalues  equal,  its  performance  is  identical  to  that  of  LMS 
Newton.  With  eigenvalue  disparity,  its  average  speed  of  convergence  and  statist¬ 
ical  efficiency  and  performance  with  nonstationary  inputs  is  identical  to  that  of 


12 


*  ' 


% 


LMS  Newton,  except  that  its  worst  case  convergence  rate  is  poorer. 

When  dealing  with  an  input  signal  buried  in  noise,  the  eigenvalue  spread  is 
slight  and  one  can  be  assured  that  LMS  steepest  descent  will  perform  as  well  or 
better  than  any  other  algorithm.  When  the  input  is  stationary,  noise  free  or  only 
slightly  noisy,  and  when  the  input  signal  is  narrow-band  with  a  highly  peaked 
spectrum,  eigenvalue  disparity  could  be  large  or  extreme.  Under  these  cir¬ 
cumstances,  opportunities  will  exist  to  better  the  worst-case  performance  of  the 
steepest  descent  form  of  the  LMS  algorithm. 


\ 
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2.1  INTRODUCTION 

The  preceding  discussion  points  out  that  comparisons  between  adaptive 
algorithms  can  be  made  on  the  basis  of  misadjustment  and  speed  of  conver¬ 
gence.  Often  the  jammers  to  be  nulled  are  of  a  time-varying  nature.  In  this 
case,  the  concepts  of  misadjustment  and  speed  of  convergence  have  appropriate 
counterparts.  For  the  time  varying  case,  one  can  view  the  algorithm  as  being 
repeatedly  restarted  with  only  a  few  input  data  samples  available  to  estimate 
the  weight  values  required  to  null  the  changing  jammer.  As  such,  the  speed  of 
convergence  of  an  algorithm  relates  to  how  well  the  algorithm  'tracks'  or  follows 
a  change.  If  a  jammer  changes  position  or  changes  spectral  character,  then  the 
optimal  weights  required  to  maintain  the  best  possible  jammer  null  must  also 
change.  Tracking  error  refers  to  the  difference  between  the  average  value  of 
the  adapting  weights  and  the  optimal  weights  at  each  time  instant.  In  a  nonsta¬ 
tionary  environment,  poor  tracking  implies  that  the  ensemble  mean  of  the 
adapting  weight  vector  is  far  from  the  optimal  weight  vector  which  would  pro¬ 
duce  the  best  possible  null.  Weight  misadjustment  is  related  to  the  noise  in  the 
adapting  weights.  If  an  adaptive  algorithm  is  tuned  to  track  a  time-varying  jam¬ 
mer.  it  will  exhibit  some  amount  of  weight  noise.  In  fact,  tracking  ability  and 
weight  noise  are  directly  related.  The  better  the  tracking  performance,  the 
higher  the  weight  noise.  Ideally,  one  would  like  an  algorithm  which  is  capable  of 
tracking  a  quickly  varying  jammer,  yet  show  little  weight  noise.  Part  2  of  this 
report  develops  several  algorithms  which  show  improvements  of  this  kind  over 
conventional  LMS. 

This  section  begins  with  a  discussion  of  the  recursive  least  squares  algo¬ 
rithm  for  adaptive  antennas.  The  algorithm  will  be  developed  in  an  exact  least 
squares  sense,  then  a  stochastic  interpretation  of  the  functionihg  of  the  algo¬ 
rithm  will  be  given.  The  tradeoffs  between  misadjustment  and  tracking  will  be 
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shown  by  actual  simulations.  A  new  algorithm  will  then  be  introduced  as  a 
method  to  improve  the  misadjustment /tracking  tradeoff.  Derivations  of  optimal 
strategies  for  use  of  data  will  then  be  made,  and  the  circumstances  under  which 
the  recursive  least  squares  algorithm  is  optimal  will  be  discussed.  Comparisons 
between  the  new  algorithm  and  current  techniques  will  then  be  presented.  The 
goal  is  to  obtain  optimal  adaptive  array  performance  with  nonstationary  inputs 
consisting  of  signal,  noise,  and  jammers. 


1  <*>Y*  •,  . 


2.2  derivation  ofwkls 


The  weighted  recursive  least  squares  algorithm  (WRLS)  analyzed  here  has 
been  previously  described  by  others  [6].  This  discussion  is  included  in  order  to 
establish  terminology  and  to  introduce  recursive  algorithms  that  can  be  applied 
to  adaptive  antenna  arrays. 

Consider  the  Zahm  beamformer  of  Fig  2. 1.  This  antenna  array  is  adapted  so 
that  the  filtered  output  of  the  auxiliary  elements  y  (f)  matches  as  closely  as  pos¬ 
sible  the  output  of  the  primary  element  d(f).  This  results  in  an  antenna  array 
sensitivity  pattern  which  is  as  omnidirectional  as  possible  while  maintaining 
nulls  on  the  stronger  incident  signals.  For  the  array  of  Fig.  2.1,  the  filtered  out¬ 
put  from  each  auxiliary  element  is 

Viis.t)  =  jg  (2.1) 

The  weights  are  now  assumed  to  be  a  function  of  the  parameter  s  which  will  soon 
be  used  to  determine  an  optimality  criterion.  Define  the  vectors, 

*7(0  =  [^..(O  •  •  •  iui.n(0]7'  .  (2.2) 

*7(0  =  [xt(f),*i(f-l),  •••  .^(f-rOr  •  (2.3) 

In  vector  notation,  Eq.  2.1  becomes 

j*(s,0  =  *f(s)Ai(0  ■  (2.4) 

The  summed  output  of  the  auxiliary  elements  is 
Y(s.t)  =  g  Vi(s.t) 

=  g  *7(0*t(0  •  (2.5) 

t»l 

To  simplify  notation  later,  introduce  the  combined  vectors, 


Figure  2.1.  Zahm  adaptive  beamformer. 


which  reduces  to 
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jf(0  = 


S  a*(0*(iUr(0 


li  =  l 


1  £  «i(0<i(iU(0 
<*1 


(2.13) 


The  optimal  choice  of  M(t)  is  given  by  Eq.  (2.13),  this  value  will  minimize  the 
weighted  error  e'(f )  of  Eq.  (2.9). 

The  basic  problem  associated  with  finding  M(t)  is  that  for  each  time  step, 
an  (n  +  l)m  by  (n+l)m  matrix  must  be  inverted.  For  notational  convenience, 
define  the  matrix 


P( 0  =  ^  aHt)X(i)XT(i) 


(2.14) 


To  find  M(t  +l),  the  matrix  P(t  +  1)  must  be  found.  Since 


F(*  +  l)  =  ca(t  +  l)X(i)XT(i) 


ai{t+\)X(i)XT[i)  +  a<M(f  +l)JC(f +  l)JC7’(f +  1) 


-1 


(2.15) 


Making  certain  assumptions  about  the  form  of  the  weighting  a*(0  results  in  a 
simple  recursion  in  which  P(t  + 1)  can  be  found  from  P{t)  in  order  (m(n  +  l))z 
operations.  Specifically,  choosing 


o i[t)  =  a1-1  i=l.  2 . t 


(2.16) 


results  in  the  widely  used  exponential  weighting  scheme,  where  the  most  recent 
error  sum  (2.9b),  makes  the  greatest  contribution  to  the  running  mean  square 
error,  and  it  has  the  greatest  effect  on  the  current  value  of  M(t ).  Also,  errors  in 
the  distant  past  make  a  relatively  small  contribution  to  e  (f ),  and  therefore  have 
a  reduced  effect  in  the  determination  of  M{t).  Figure  2.2  shows  how  the  weight¬ 
ing  given  by  (2.16)  can  be  interpreted  as  a  sliding  exponential  window  on  the 
mean  square  error.  Assuming  an  exponential  form  for  ai(£ )  gives 


«*(<+!) 


ftf*1-*  i  =  1,  2 . t 

1  i =t  + 1 

a  Qt(f)  i  =  i,  2 . I 

1  i  =  f +  1 


(2.17) 


Relative  1.0 

Importance 

of 

e(1) 


Most  Recent 
Error  Weighting 


Figure  2.2.  Exponential  weighting  of  e(i). 
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so  Eq.  (215)  reduces  to 

/>(*  + 1)  =  (a  £  ai{t)X(i)XT(i)+X{t  +  \)Xrit  +  lj 

l  <=i  J 

=  (aMO+^  +  lW'+l)]  '  • 


(2.18) 


The  particular  choice  for  a  value  of  a  will  be  made  later.  For  now,  assume  that  a 
is  known.  Equation  2.18  can  be  simplified  using  the  matrix  inversion  lemma 
(Appendix  A)  to 


P(f+1) 


J_LfM  -  />«)*(* +  0^1 

a\  V  '  a  +  XT{t  + 1  )P{t  + 1) 


(2.19) 


Equation  (2. 19)  gives  a  recursive  update  for  P(t)  which  is  far  simpler  than  per¬ 
forming  a  direct  matrix  inversion  at  each  step.  Also,  due  to  the  form  of  a,(<) 
the  update  for  W(t )  becomes 


JK(f  +  l)  =  P(*+l) 


»£  ax(t)d(i)X[i)  +  d(t+l)X(.t  +  l)  .  (2.20) 

t=i 


This  expression  can  be  combined  with  Eq.  (219)  to  yield  a  recursive  update  for 

*(/): 


jr(<  +  i)  =  ju(0  + 


P(t)X{t  +  l)(d(H-l)  -  Jfr(f)X(f  +  l)] 
a  +xT{t+:)P{t)X[i  +  :) 


(2.21) 


Appendix  B  contains  all  intermediate  steps  for  deriving  Eq.  (2.21)  from  Eq. 
(2.20).  Equations  (2.19)  and  (2.21)  are  the  update  recursions  for  the  WRiS  algo¬ 
rithm. 
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2.3  klSADJ  USTMENT  AND  TRACKING  OK  WKLS 

The  IffRLS  algorithm  given  by  Eqs.  (2.19)  and  (2.21)  was  implemented  on  a 
two-element  Zahm  beeupformer  shown  in  Fig.  2.3.  The  two  element  array  had 
one  incident  time  varying  jammer,  which  generated  a  wavefront  according  to 

*(0  =  ac(Oe(OWO«(f-i)  .  (2.22) 

where  <(f )  is  a  white-noise  sequence.  Note  that  if  ac(f)  and  o,(f )  are  constants, 
the  jammer  is  stationary.  Allowing  a  time  varying  a0(f)  and  ai(0  makes  the 
spectral  character  of  the  jammer  change  over  time.  The  array  spacing  s  and 
the  jammer  angle  of  incidence  were  chosen  so  that  a  time  delay  of  one  sam¬ 
pling  period  existed  between  the  signals  received  by  the  primary  and  auxiliary 
elements.  In  the  notation  of  Fig.  2.3,  this  is 

x(t)  =  d{t- 1)  .  (223) 

Specializing  the  situation  to  that  described  by  (2.23)  allows  an  easy  calculation 
of  how  the  algorithm  would  optimally  behave.  This  optimal  behavior  can  be 
found  as  follows.  The  array  output  is  given  by 

c(f)  =  d  -w{t)x(t) 

=  d{t)  -  w{t)d(t-l)  .  (2.24) 

The  input  is  known  to  be  generated  by  a  process  given  by  (2.22).  An  expression 
for  vi(f)  in  terms  of  a„(t)  and  a,(f )  can  be  found  which  represents  the  optimal 
choice  of  ui(f)  to  minimize  E{t{t))  for  every  f,  since 

4*(o)  =  E[{d{t.)-w{t)d(t-i)y] 

=  E[w(t  -n)  -  ui(f  )lf(t  — n.-l)j2 
=  (f-n)e(f  -n)  +  a,(f  -n)9\t  -n -l)j 
-  to(/)|a,(f-n-l)c(t  -n.-l)  +  a^f-n-l)®^  — n-2)J* 

=  (-.V  ~n)  T  a -n)  -  2u-  ;!)(aj(f  -n)o, (f  —n 
+  af(f-n-l)] 


(2.25) 


le,  one  weight  Zahm  adaptive  array. 


24 


Taking  derivatives  and  equating  to  zero  gives  the  optimal  w(t),  denoted  by 
n/(f).  as 

0  =  -2|a,(t  — n)^  (f -n-l)j  +  2tiJ#(t)[aoZ(^ -n-l)  +  af  (f -n-l)j  ,(2.26) 


or 


«•<*) 


djit  -71)00(^-71-1) 
a£(t  -n-l)  +■  af(f-n-l) 


(2.27) 


This  formula  for  the  optimal  weight  value  requires  complete  statistical 
knowledge  about  the  jamming  signal  and  is  used  only  as  a  benchmark  by  which 
to  compare  practical  adaptive  antenna  algorithms  such  as  WRLS.  The  WRLS 
algorithm  (2.19,2.21)  was  used  to  estimate  n/(£)  and  plots  are  shown  comparing 
tu(£)  to  ui*(f )  (Fig.  2.4). 

The  plots  in  Fig.  2.4  illustrate  the  tradeoff  between  weight  tracking  speed 

and  weight  estimate  variance.  Figure  2.4a  is  a  plot  of  the  optimal  weight 

trajectory.  Figures  2.4b, c,d  are  plots  of  the  adapting  u>(£)  for  progressively 

• 

larger  values  of  a.  the  decay  rate  factor.  A  graph  of  at_t  is  shown  with  each  plot 
to  give  some  visual  idea  of  the  relative  error  weightings.  Note  that  decreasing  a 
(placing  more  importance  on  new  data)  leads  to  better  average  tracking  of  the 
true  parameter,  but  the  adapting  weight  variance  (misadjustment)  is  quite 
large.  Conversely,  increasing  a  (corresponding  to  a  decrease  in  the  importance 
placed  on  new  data)  the  algorithm  lags  behind  and  tracks  the  optimal  weight 
poorly,  but  the  adapting  weight  value  has  a  low  variance.  The  adapting  weight  of 
Fig.  2.4b  is  considered  to  have  high  miadjustment,  while  the  adaptive  weight  of 
Fig.  2.4d  has  poor  tracking  qualities.  In  typical  applications  of  this  algorithm,  a 
choice  of  a  is  based  on  the  tradeoff  between  these  two  properties.  Such  a  choice 
is  shown  in  Fig.  2.4c,  where  both  low  misadjustment  and  good  tracking  proper¬ 
ties  are  obtained.  In  the  sections  to  follow,  we  develop  a  new  class  of  algorithms 


variance  and  tracking  for  different  choices  of  a 
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(2.31)  converges  to  an  estimate  of  the  autocorrelation  matrix  /?* 


[4  £*t(0|  Et  on(t)X(i)XT(i)  =  [4  at(f)]  *4  at(0£W)*r(i)] 

(i  =  l  <=I  4=1 

=  *«[4  04(0] '  [4  04(f) 


=  /?. 


(2.32) 


Similarly  the  last  term  in  (2  31)  converges  to  an  estimate  of  the  cross¬ 
correlation  matrix 


a«(f) 

E  4  a*(f)<f(t)X(i)=  [4  04(f)] 

4 

ai(t)E(d(i)X(i) 

4  =  1  t  =  l  J 

1=1 

=  pJi  a4(f)' 

-1 

£  «t(f)| 

[4=1  j 

j 

=  P» i 


(2.33) 


Hence,  in  the  stochastic  sense  the  weight  vector  is  given  by 

•ff(0  =  (2.34) 

•A.  A 

where  ff«(f )  and  PXd{t )  converge  in  the  mean  to  the  true  R„  and  values  if 
the  input  signals  are  stationary.  This  is  consistent  with  Wiener  filter  theory. 

In  the  case  that  the  input  signals  are  not  stationary  but  are  time-varying, 
the  quantities  R ^(t)  and  P^}{t)  are  sample  mean  estimates  of  the  true  time- 
varying  covariance  matrix  and  cross-covariance  vector.  This  is  an  important 
point  to  be  remembered  for  the  discussion  that  follows.  In  situations  where  the 
input  signals  are  nan-stationary,  the  true  covariance  quantities  R„  and  P^ 
will  be  functions  of  time  denoted  by  /?**(f )  and  P^(t  ). 

Now,  components  of  the  iterative  relations  (2.19)  and  (2.21)  can  be  com¬ 


pared  with  those  of  the  LMS/Newton  algorithm.  Normalizing  (2.35)  gives  the  fol¬ 
lowing  recursive  relations: 
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Jf(/+l)  =  M{t)  + 

=  Jf(0  + 


o  +  *r(f  +  l)P(0*(*  +  l) 


P(t)X(t  +  l)(d(t  +  l)  -Mr(t)X(t  +  l)) 


Ka+XT{t  +  i)fCP(t)X{t  +  l) 


KP{t)X(t  +  l){d{t  +  l) 


-  Mr{t)X(t  +  l)) 


(2.35) 


where  the  normalized  constant  K  is  chosen  to  be 

K  =  t  ««(0  ■ 

<31 

For  the  case  of  exponential  aj(f)  this  reduces  to 
K  =  t 

i=l 

1  -a* 

1  -  a 


(2.36) 


(2.37) 


This  choice  of  K  gives 

KP(t)  =  P«l(0  (2.38) 

Hence  the  algorithm  (2.19)  and  (2.21)  constitute  a  form  of  LMS/Newton  algo¬ 
rithm,  with  an  adaptation  coefficient  which  is  time  variable; 


Pit) 


1 


Ka  +  2:T(f  +  l)  Pal(t)X(t+l) 
1 


&  ■_  +  x7(t +i)R£(t  yx(t  -n) 


(2.39) 


So  this  gives  the  ff(0  adaptive  v^date  algorithm  as 

jr(*+i)  =  n:(f)  +/4t)ii'(fW+i)[d(i+i) -.Kr(f)jr(f  +  i)).  (2.40) 


which  is  quite  similar  to  the  LMS/Newton  form  given  in  Part  1.  An  interpretation 
for  the  action  of  the  can  be  found  by  looking  at  the  XT(t  +  l)P^{t)X{t  +  l) 
term.  If  X(t)  is  assumed  to  be  a  zero  mean gaussian  process,  then  the  probabil¬ 
ity  of  a  given X(t)  occurring  is 
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PW)  =  *]  = 


_ 1 _  ~XT[t)R^X[t) 

(2jt  )n/2(  det  R„)* 


(2.41) 


Where  R„  is  the  covariance  matrix  of  the  multivariate  density.  Since  R„  is  a 
positive  definite  matrix,  excursions  of  X{t)  from  the  mean  are  decreasingly 
probable  Also,  since  X T  R^X  increases  for  less  probable  X(t),  fi(t)  is 
decreased.  Hence  /a(0  can  be  thought  of  as  automatically  scaling  the  step  size 
by  the  relative  probability  associated  with  a  particular  value  otX(t). 

As  is  clear  from  the  previous  development,  the  WRLS  and  LMS/Newton  algo¬ 
rithms  both  perform  an  exponential  weighting  of  incoming  data.  Single  mode 
exponential  weighting  is  refered  to  here  as  a  first  order  process.  This  terminol¬ 
ogy  comes  from  the  form  of  the  recursion  used  to  generate  exponential  weight¬ 
ing.  For  this  reason  WRLS  and  LMS/Newton  are  called  first  order  adaptive  algo¬ 
rithms.  In  the  next  section  we  explore  the  possibilities  of  higher  order  data 
weighting  schemes  and  the  resulting  higher  order  adaptive  algorithms. 


1  -*•  ,«■,  ' 
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2.5  ALTERNATIVES  TO  EXPONENTIAL  DATA  WEIGHTING 

This  section  develops  the  intuitive  justification  for  why  nonexponential  win¬ 
dowing  coefficients,  otj(f).  are  desirable.  A  particular  choice  of  the  a i(t)  can  be 
made  by  defining  a  suitable  performance  measure  and  by  assuming  some 
knowledge  of  the  nonstationary  character  of  input  signals,  jamming  interfer¬ 
ences,  and  noise. 

An  exponential  form  for  the  windowing  coefficients  was  assumed  in  section 
2.2  in  the  development  of  the  WRLS  algorithm.  The  exponential  form  leads  to  a 
simple,  recursive  way  of  estimating  P{t ),  the  inverse  sample  covariance  matrix. 
The  unnormalized  expression  for  P-1(f )  is 

P~\t~  1)  =  *»(*  + 1) 

=  t  <n(t)x(i)xT(i) 

<=i 

=  t  a*-'X(i)XT(i)  (2.42) 

t=i 

This  expression  can  be  cast  into  a  first-order  recursive  difference  equation  form 
as 

R„(t  +1)  =  aR„(t)  +  X{t)XT{t)  .  (2.43) 

a 

A  more  general  class  of  estimators  for  R  a{t)  can  be  obtained  by  allowing 
higher  order  autoregressive  moving  average  (ARMA)  difference  equations  of  the 
form, 

R„(t+1)  =  0,^(0  +  ••  4  Rzs  ( t  —n. )  +b0X(t)Xr(t) 

+  •••  4-  bmX{t-m)XT(t-m)  .  (2.44) 

Hiis  ARMA  representation  of  the  estimator  has  time  invariant  coefficients  and 
can  therefore  be  viewed  as  a  linear,  time  invariant  filter  acting  on  X(t)XT{t),  to 
produce  /?„(£)  The  z-transform  of  the  process  given  by  (2.44)  is 


A(z)R„(z)  =  £(z)X(z)Xr(z) 


(2.45) 
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where  A{z)  and  B{z )  are  polynomials  of  the  form 

A(z)  =  1  -  a,z~l  -  a2z-8  -  •••  -  a^z (2.46) 

f?(2)  =  60  +  b,z-1  +  b2z_z  +  •  +  bmz-m  ,  (2.47) 

and  z~l  represents  a  unit  delay.  Equation  (2.45)  can  be  rewritten  as 

*»(*)  =  f(ff  *(«)*T(*>  •  (2  «) 


The  polynomial  ratio  B{z)/  A(z)  represents  the  z-transform  of  a  filter  with  input 
X{z)XT  and  output  R„(z).  Figure  2.5  illustrates  the  idea.  The  B{z)  polynomial 
possesses  m  roots.  These  roots  are  commonly  referred  to  as  ’zeroes'  of  the 
filter.  Similarly  the  A(z)  polynomial  has  n  roots  called  the  ‘poles’  of  the  filter. 
These  names  come  from  the  effect  a  root  of  the  polynomial  has  on  the  filter 
transfer  function, 

"<*: )  =  ■  (2  49) 

A  root  of  B(z)  will  cause  a  zero  in  H(z),  a  root  in  A{z)  will  cause  an  infinite 
value  (or  pole)  in  the  value  of  H(z).  This  z-transform  notation  has  been  intro¬ 
duced  so  that  the  frequency  response  of  a  filter  may  be  easily  understood.  The 
free  variable  z  is  a  complex  number,  which  when  given  the  value, 

z  =  e*"r  (2.50) 

and  substituted  into  (2.49)  gives  a  complex  number  for  H{z).  The  magnitude  of 
this  number  represents  the  magnitude  of  the  response  of  the  filter  to  a 

sinusoidal  input  at  frequency  /  =  .  Figure  2.6  shows  this  idea. 

A  frequency  domain  interpretation  of  (2.48)  lends  valuable  insight  into  the 
significance  of  allowing  a  higher  order  estimator.  The  z-transform  of  the  first 
order  estimator  of  Eq  (2.43)  is 
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Figure  2.5.  Representation  of  the  general  estimator  as  a  linear  filter. 


Unit  Power  Input 
Sinusoid  of  Frequency  ^ 


Transfer  Function  H(z) 

Figure  2.6.  Evaluation  of  frequency  response  of  a  linear  filter. 
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(1-a* -*)*»(*)  =  X[z)XT(z)  (2.51) 

or 

*«<*>  = 

This  is  the  same  transform  that  was  performed  to  arrive  at  (2.48).  This  is  shown 
•  in  Fig  2.7.  The  form  pf  (2.52)  tells  us  that  the  exponential  window  acts  as  a  sin¬ 
gle  pole  filter.  This  single  pole  estimator  is  a  low-pass  filter  with  cutoff  a.  Intui¬ 
tively,  this  means  the  estimator  passes  low  frequency  (slowly  varying)  com¬ 
ponents  of  the  X(t)Xr(t)  sequence,  while  rejecting  high  frequency  (quickly  vary¬ 
ing)  components  of  X(t )Xr(t ).  A  plot  of  the  response  of  the  filter  described  by 
(2.52)  as  a  function  of  frequency  is  shown  in  Fig.  2  8  Note  that  the  general  esti¬ 
mator  of  Eq.  (2.48)  allows  many  poles  and  zeroes  Because  of  this,  proper  choice 
of  the  dj  and  6t  coefficients  in  (2.46)  and  (2.47)  will  place  dips  and  peaks  in  the 
frequency  response  of  the  filter.  In  the  case  of  the  single  pole  estimator,  the 
only  choice  to  be  made  was  where  the  frequency  response  starts  to  drop.  The 
more  general  formulation  with  its  increased  degrees  of  freedom  allows  a  fairly 
arbitrary  frequency  response 

The  usefulness  of  the  more  general  formulation  can  be  pointed  out  with  a 
simple  example.  Recall  that  in  situations  where  the  jammers  are  nonstationary, 
the  covariance  matrix  will  be  dependent  on  time.  Suppose  it  is  known  that 
R„(t)  as  a  function  of  time,  has  a  certain  spectrum.  A  component  of  the  R^ciO 
matrix  will  typically  have  a  spectrum  as  shown  in  Fig.  2.9a.  A  finite  time  average 
of  the  same  component  of  X(t  )XT {t )  will  usually  have  a  spectrum  like  Fig.  2.9b. 
Generally,  simple  averaging  or  exponential  averaging  (as  in  section  2,2)  of 
Jf(f)Jfr(0  will  not  give  a  spectrum  which  matches  the  true  /?**(!)  spectrum. 
This  discrepancy  arises  because  for  non-stationary  situations,  the  time  averages 
used  to  estimate  f?*»(f)  are  not  the  same  as  the  ensemble  averages  which 


1  -  az' 


X{z)XT{z) 


(2.52) 
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constitute  the  true  /?«(£)  .  Clearly,  something  better  than  a  simple  average 
must  be  used.  Intuitively,  the  estimator  acting  on  the  data  of  Fig.  2.9b  should 
attempt  to  notch  out  those  components  with  small  magnitude  in  Fig.  2.9a.  A 
general  filter  such  as  the  one  in  Fig  2.5  could  do  this  easily,  while  the  single  pole 
filter  of  Fig.  2.7  would  be  a  compromise  at  best.  Figure  2.10  shows  this  pictori- 
ally.  This  is  very  similar  to  the  matched  filter  concepts  used  in  signal  detection 
theory 

For  this  preliminary  investigation,  an  all-zero  filter  approximation  to  the 
pole-zero  form  of  (2.48)  will  be  made.  As  shown  above,  the  first-order  estimator 
for  Rsz{t)  had  two  expressions:  a  single  pole  (autoregressive)  form,  Eq.  (243), 
and  a  multiple  zero  (moving  average)  form.  Eq.  (2.42).  Equation  (2.43)  is  called 
a  parsimonious  representatiion  of  the  estimator,  since  it  requires  only  one 
parameter  (a)  to  describe  it.  Since  practical  considerations  require  a  fixed 
memory  size,  Eq.  (2  42)  cannot  be  implemented  directly.  A  good  approximation 
can  be  made  by  choosing  a  sufficiently  large  m  so  that  am  =  0,  which  gives 

i»(0  =  4  a^X{i)XT(i) 

i=i 

=  t  atHX(i)Xr(i)  .  (2.53) 

1=1  -m 

A  development  for  the  general  autoregressive  model  demonstrates  that  if 

=  a +  anR„{t-n)+X(t)XT(t)  .  (2.54) 

it  can  also  be  estimated  by  a  moving  average  process, 

*»(0  =  t  aiitMiWi)  •  (2.55) 

i=<  -i 

The  representation  is  perfectly  accurate  if  l  =4—1.  and  is  usually  quite  good  if  l 
is  taken  to  be  large  but  fixed.  A  proof  of  this  result  can  be  found  in  [10].  The 
result  of  these  steps  is  that  the  estimator  for  Ra(t)  (2.44)  can  be  represented 


n 

L 


39 


by  the  simple  moving  average  process 

R„(t)  =  £  .  (2.56) 

<*i  -t 

The  cost  of  formulating  the  problem  this  way  is  one  of  a  large  memory  require¬ 
ment  (more  than  is  absolutely  necessary);  the  advantage  is  a  form  for  the  esti¬ 
mator  which  is  readily  calculated. 

Since  Eq  (2.44)  had  constant  coefficients,  the  a,( /)  coefficients  will  depend 
only  on  the  index  t  —i  This  result  can  be  found  in  [  10]  So  define 

c(f-i)  =  o,(0  (2.57) 


which  gives 

*»(*>  =  £  c(t-i)X(i)XT(i )  .  (2.58) 

i=i  -i 

Equation  (2.58)  represents  R „{t)  as  a  convolution  of  the  sequence  c(i)  and 
Jf(ipLT(i)  In  this  context,  the  c  {%)  have  an  interpretation  as  the  impulse 
response  to  the  linear  filter  of  Fig.  2  5  and  can  be  drawn  as  in  Fig.  2.11. 

Up  to  this  point,  the  concept  of  windowing  has  been  applied  only  to  estima¬ 
tion  of  R„(t),  but  Pat(t)  must  also  be  estimated  by  the  same  window, 

P«{t)  =  £  c  (f  — i)jC(i)d  (i)  .  (2.59) 

Note  that  because  arbitrary  c(i)  coefficients  are  allowed,  the  efficient  (order 
((n-H)m)2)  update  will  not  work.  Hence  R„(t)  must  be  found  using  equation 
2.58,  then  inverted  at  each  time  step  A  single  inversion  of  Ra(t)  requires  an 
order  of  ((n  +  l)m)3  operations,  which  is  not  computationally  acceptable.  Com¬ 
putational  issues  will  not  be  considered  presently,  since  the  point  of  this  investi¬ 
gation  is  to  determine  if  allowing  arbitrary  windowing  coefficients  c  (i)  results  in 
an  improvement  of  the  algorithm's  performance.  In  subsequent  investigations, 
the  iterative  inverse  algorithm  [11]  will  be  used  as  a  means  of  alleviating  this 


Impulse  Response  of  Filter  of  Figure  2.5. 


Figure  2.11.  Realization  of  Filter  of  Figure  2.5. 
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2.6  AN  OPTIMAL  WINDOW  CRITEldON 

The  adaptive  algorithm  given  in  section  2.2  (Eqs.  2.19,  2.21)  forms  estimates 
7?x*(0  and  Pxd{t)  Using  these  estimates  the  algorithm  finds  a  weight  vector 
JX(t)  which  will  place  a  null  on  the  jammers  present.  Section  2.4  discussed  the 
fact  that  when  the  jammers  are  time  varying,  Ra  and  become  time  varying 
quantities  as  well.  Therefore,  the  problem  is  reduced  to  estimating  time  varying 
covariance  quantities,  R^ (/)  and  Pgd(t).  as  accurately  as  possible,  to  null  the 
jammer.  It  is  desired  to  form  this  null  quickly,  while  simultaneously  maintaining 
an  acceptable  level  of  weight  variance.  Weight  variance  is  undesirable  since  it 
may  lead  to  signal  cancellation  [12].  Generally,  the  time  varying  jammer 
parameters  are  assumed  to  be  stochastic  processes.  For  this  development,  a 
certain  subset  of  the  jammer  parameters  &t,  will  be  allowed  to  change  in  time, 
and  their  values  will  be  found  as  the  output  of  a  filter  driven  by  white  noise  (see 
Fig.  2.12).  Specifically,  the  jammer  will  have  time  varying  frequency  charac¬ 
teristics  as  shown  in  Fig.  2.13. 

Ra(t)  and  P& (<)  will  be  found  as  weighted  sums  of  past  data,  as  shown  in 
Eqs.  (2.56)and  (2.59),  of  section  2.5.  With  the  above  formulation,  the  idea  is  to 
now  choose  the  window  coefficients  c(t)  so  that  Ra{t)  is  in  some  sense  a  best 
estimate  of  Ra{t).  Mean  squared  error  will  be  used  as  a  measure  of  quantity  of 
the  estimate.  The  problem  then  becomes, 

Minimize  £■[  ||  j?„(f) |*]  .  (2.60) 

For  this  preliminary  investigation,  values  of  c  (i)  will  be  found  which  optimize  the 
quality  in  (2.60),  and  then  used  to  estimate  the  /»j(0  vector.  A  totally  general 
approach  would  be  to  define  some  performance  criterion  which  measures  the 
accuracy  of  both  the  R„{ t)  estimate  and  the  /**(f)  estimate.  This  more  gen- 

a 

eral  development  will  be  pursued  later  this  year.  Since  R„(t)  and  /?»(t)  are 


Time  varying 
jamming  process 
parameter  generation 


T  Output 

Figure  2.12.  Adaptive  antenna  and  assumed  jammer  model  used  in 

estimation  of  time  varying  process  covariance  matrix. 


matrices,  a  matrix  norm  must  be  chosen  to  give  (2.60)  meaning.  A  simple  norm 
could  be  obtained  by  changing  (2.60)  to 

Muju^Tjize  E[trace  (#*»(*)  -  ./?**(0)Z]  (2.61) 

A  solution  for  (2.61)  may  be  approximated  by  assuming  the  array  to  have  only 
one  auxiliary  element,  and  solving 

Minimize  E[<p„{t)  -£„(f))8]  (2.62) 

I*  (01 

where  <p0  (t)  is  the  zeroth  order  time  varying  covariance  lag, 

P.(0  =  £(*?(*)  |tf|)  (2.63) 

and 

5.(0  =  t  c(t-i)*?(i)  •  (2  64) 

In  making  the  above  assumptions,  the  major  simplification  arises  by  performing 
the  minimization  on  the  squared  error  of  the  zeroth  order  lag.  The  assumption 
of  only  one  auxiliary  element  does  not  decrease  the  generality  of  the  result,  but 
does  decrease  the  notational  complexity. 

The  solution  to  Eqs.  (2.62)  through  (2.64)  is  a  simple  result  of  Wiener  filter 
theory.  Let 

Z(f)  =  X\{t)  ,  (2.65) 

and  minimize  (2.62)  by  setting  its  derivative  with  respect  to  the  C(i)  equal  to 

'  \ 

zero.  This  gives 

'  ~  0  =  8^77  (*(<*.(0- 5.(0)*)  j  =  t-l . t 

=  teftSjy  (*(*<0  -  i)  c(t-i)zd))*)  j  =  t-i . t 

=  -  £  c(f-i)Z(»)Z0))  j  =  t  -l,  t  (2.66) 
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£  c(f-i)A^(i)Z(;))  )=  £{%  (0^0)]  .  (2.67) 

Equation  (2.67)  can  be  more  efficiently  stated  in  matrix  notation  as 

R^C  -  Px„  (2.68) 

or 

C  =  .  (2.69) 

where 

Zt{JL  )  =  [*  (t  -i ).  s  (t -l  + 1) . z(t) ]  (2.70) 

/?«  =  E[Z(t)Zr(t))  (2.71) 

=  £[^(<K(0]  (2  72) 

CT  =  [c(I-l),  c(l-2) . c(2)]  .  (2.73) 

Equation  (2.68)  is  a  large.  Toeplitz  set  of  equations.  Typical  values  of  l  are  about 
200.  which  allows  for  a  good  approximation  of  the  AR  part  of  the  estimator  (see 
discussion  following  Eq.  (2.52)).  The  quantities  RM  and  Pj_r  are  found  from  the 
statistics  of  the  time  varying  parameters  of  the  jammer. 

Examples  of  solutions  to  (2.69)  for  certain  assumed  jammer  models  will  now 
be  given. 


K'KT-SVV ,~ 
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2.7  DETERMINING  AN  OPTIMAL  WINDOW  FOR  A  SIMPLE  NON -STATION  ARY  JAMMER 

An  optimal  window  will  now  be  found  for  a  jammer  using  a  single  zero  pro¬ 
cess  to  generate  its  wavefront.  Figure  2.14  shows  the  antenna  configuration,  and 
how  the  jammer  wavefront  is  generated,  The  jammer  wavefront  is  given  by 

H'(f')  =  u{t')  +  al(f')it(f'-l)  (2.74) 

where  u(f )  is  a  white  noise  sequence.  Note  that  the  same  antenna  configuration 
outlined  in  section  2.3  is  used  here.  This  configuration  will  be  used  throughout 
this  part  of  the  report  to  allow  a  common  basis  for  comparison.  The  signal  at 
the  auxiliary  element  would  then  be, 

x(t)  =  it(f'-n)  +  — n  — l)  .  (2.75) 

For  simplicity,  shift  the  time  index  relative  to  x,  so  that  t  ~  t'-n  ,  giving 

x(t)  =  u(t)  +  ax(t)u{t-l)  .  (2.76) 

As  outlined  previously,  ax{t)  is  generated  by  a  time  invariant  stochastic  process. 

This  formulation  of  the  problem  can  now  be  solved  using  the  techniques 
presented  in  section  2.6.  First  form  /?**,  then  find  and  solve  the  resulting 
set  of  linear  equations.  Expressions  for  the  R „  matrix  and  the  vector  will  be 
found  for  each  component.  These  components  will  then  be  reassembled  to  find 
the  matrix  and  vector. 

An  expression  for  /?«  will  be  given  as  a  function  i ,j ,  which  indicate  which 
row  and  column  component  (respectively)  this  expression  represents.  Both  i 
andj  range  from  t-l  to  t  since  P**  is  l  by  l  matrix.  First, 

*«(*■;)  =  fif*  (*)*(») 

-  ^((“(i)  +  a,(i)w(i-l))2(u(y)  +  a,(j)u(y-l))z]  ,  (2.77) 

and  assume  u(f)  is  a  white  noise  sequence, 


r 

L. 


■  »  y+&.. 


Jammer  Wavefront 


W(t) 

Time  Varying 
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Figure  2.14.  Single  zero  jammer  wavefront  generation. 


White  Gaussian  Noise 
2  .  , 


V  =  0 


* (t) 


Figure  2.15.  Model  of  first  order  Harkov  process  generating  a^(t) 


49 


(2.78) 


F[tz(i).uO)3  =  {  JaJ  • 

Multiplying  out  (2.77).  distributing  expectations,  and  substituting  (2.78)  gives 
E[z*{j)z2U))  =  E[uHi)u*(j)]  +  £7[a?(t)uz(i-l)uz(,)]  + 

4£’[a,(i)al(;')u(i}u(7)u('i— l)u(j  -l)]  + 

?(;')«*(» -l)wz(jf)]  +  £’[ap(i)af(;)uz(i-l)u2(;-l)]  .  (2.79) 

Now  assume  a(t)  is  uncorrelated  with  u  and  has  zero  mean.  Denote  the  expec¬ 
tation  operator  as 


E(z)  =  £  . 


(2.80) 


which  gives 

£l**(i)*a0)]  =  «*(i)«4(j>  +  nf(i)u*(i-!)uz(;) 

+  j»Ti -:TuT;^iT 

+  ofOJu^i-lJu2^)  +  a?  UW  \J  )uz;i- l)uz0  -1)  (2  81) 

f\irther.  since  a(f)  is  a  stationary  sequence  the  value  or  /?_**(*  ,j)  depends  only 
on  |i-j  |.  From  (2  8l), 


u*(H-ci?)  +  6u^aJ 

+■  {^(l+o^+af  (i)af  (;)) 
u^(  1  +  Zat+a.  f  (i )  a  ?  (j)) 


|i-7'l=0 
|i— >  1  =  1 
ji->iat2 


(2.82) 


An  egression  for  r**  will  be  found  in  a  similar  way.  Component  i  of  the  P±f 
vector  will  be  denoted  /*,(i).  The  components  are  found  as. 

P.f(i)  =  E(**[i)P,)  (2-83) 

where  i  ranges  from  t  to  f  -f.  Pt  is  the  expected  value  of  the  power  of  x  given 

knowledge  cf  the  value  of  the  stochastic  parameter  a,  (0. 


P,  =  ^(Ola.to) 


•V*- . 


where  q  (f )  is  the  white  gaussian  noise  sequence.  The  value  of  a,(f )  is  a  function 
cf  its  last  value  and  the  random  q{t )  sequence,  hence  the  name  first  order  Mar¬ 
kov.  Figure  2.15  illustrates  the  generation  of  o,(f)  in  the  z-transform  notation 
of  equation  (2.45).  The  autocovariance  of  a,(f  )  is  given  by 

£■[(01(1)0, 0»]  =  a|fc|  (2.92) 

where  k  =  i-j.  A  derivation  of  this  result  is  presented  in  [14].  Note  also  that 
since  q{t )  is  a  gaussian  sequence,  the  output  of  the  linear  filter  a(f )  will  also  be 
gaussian  (this  result  can  be  found  in  [13]).  Because  a j(f)  is  gaussian.  its  fourth 
order  moments  can  be  broken  down  as, 

E[{az{i)az(j))]  =  E[(az{i))E{az{j ))]  +  2[£,[(o1(i)a,(j))]]2  .  (2.93) 

Substituting  (2.92)  gives 

E[a2{i)a\j))  =  1  +  2a*'*l  .  (2.94) 

Using  this  result,  (2.88)  becomes 

fcu(l+^<j)  +  6  ji— j |  =  0 

/?«(i,;)  =  fc„  +  3  +  2azl<->1  | i-j  |  =  1  (2.95) 

(  4  +  2a*!i';'1  IWIM 

and  (2.89)  becomes 

P±,(i)  =  4  +  2a*l<-‘I  (2.96) 

Since  a,(f  )  is  gaussian  ka  =  3. 

There  are  two  major  points  to  be  made  about  (2.95)  and  (2.96).  First,  the 
assumed  first  order  Markov  variation  of  a ,(f)  results  in  an  exponential  window¬ 
ing  form  of  the  c(i)  whenfcu  is  small.  This  result  is  found  by  solving 

C  -  R'jjT'Pz..,  (2.97) 

using  (2.95)  and  (2.S6).  The  c  (i)  turn  out  to  be  r  xponential  with  decay  factor  az. 
The  c(i)  to  not  exactly  follow  decay  for  values  of  i  near  0  or  l,  but  these  effects 


are  due  to  the  finite  length  window  and  are  negligible.  The  point  is  that 
exponential  windowing  of  data  is  optimal  when  the  time  varying  parameters  are 
first  order  Markov.  The  second  point  is  to  interpret  the  meaning  of  the  kurtosis 
of  u,  and  observe  the  effect  ku  was  on  the  optimal  window.  Since  u(t)  is  zero 
mean.  (see  Eq.  2.96)  can  be  viewed  as  a  measure  of  the  variance  of  the  power 
of  u{t).  A  large  Jfc„  means  that  the  power  of  u(t )  undergoes  large  changes  in 
comaparison  to  its  average  power.  A  small  value  of  ku  means  the  u(t)  process 
u(f)  was  small  deviations  about  its  average  power  .  Note  the  effect  this  has  on 
the  window,  large  ku  values  force  close  to  a  tridiagonal  form  which  means 
the  c(i)  coefficients  decrease  slowly.  This  is  intuitively  understandable,  since 
more  data  will  be  required  to  form  a  meaningful  average  of  u(t ).  Small  does 
just  the  opposite  making  the  c  (i)  coefficients  decrease  exponentially. 
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2.8  AN  OPTIMAL,  WINDOW  FOR  A  NON-SIMPLE  NON -STAT1  ON AKY  JAMMER 

This  section  presents  an  optimal  window  for  a  jammer  using  a  multiple  zero 
filter  to  generate  its  wavefront  (see  Fig,  2.16).  The  angle  of  arrival  (^)  is  held 
constant,  and  the  frequency  components  of  the  jammer  are  varied.  The  wave- 
front  of  the  jammer  is  generated 

W(t')  =  a0  (t')u(t)  +  ■■■  +ale(t')u(t'-k)  (2.98) 

where  u(t ’)  is  a  gaussian,  white  noise  sequence.  The  signal  at  the  auxilliary  ele¬ 
ment  is, 

x(£)  =  a„  (f'-m)it(f -m)  +  +  a*  (f  ~m)u(t  -k  — m)  (2.99) 

and  redefining  the  time  index  as  in  Section  2.7  so  that  t  -  t '  -  m  gives 
x{t)  =  a,,  (/ )ix  )  +  •••  +  ak{t)u(t-k) 

=  £  ^(fMf-n)  •  (2.100) 

n=0 

The  coefficients  a^f)  are  time  invariant  stochastic  processes  which  represent 
the  time  varying  nature  of  the  jammer.  As  described  before  (section  2  5),  this 
filter  is  a  moving  average  (all  zero)  type,  which  means  that  for  a  fixed  set  of 
Oi(f ),  z(t)  has  a  frequency  spectrum  with  n  dips  in  it.  The  specific  character  of 
the  dips  is  a  function  of  the  (f ).  hence  if  the  o^f)  are  allowed  to  change,  the 
spectral  character  of  the  jammer  becomes  the  varying. 

An  optimal  window  for  the  data  of  the  above  described  jammer  can  be  found 
by  following  a  similar  collection  of  steps  to  those  of  section  2  7.  First,  form  the 
Rjjl  and  P*jf  quantities,  then  solve  for  the  optimal  window.  The  matrix  and 
Pj,f  vector  will  be  determined  in  a  component-wise  fashion  to  make  the  deriva¬ 
tion  more  clear.  Once  these  component  expressions  are  determined,  they  will 
be  combined  to  form  and  P^r. 

The  R ^  matrix  will  now  be  found.  Let  the  components  of  RM  be  denoted 


T 


Figure  2.16.  Multiple  zero  jammer  wavefront  generation. 


Figure  2.17.  Model  of  order  Markov  process  generating  aft). 
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Rat(i.j).  where  i  is  the  row  index  and  j  is  the  column  index  which  locate  the 
component  in  R Bothi  and  j  range  from  t  -l  to  t  and  is  an  £  by  £  matrix 

=  £^£’(2(i)z(j)|om(i)a7.(j))) 

=  £|£’(22(i)x2(;)|am(i)(in(7))]  (2.101) 


The  last  equality  of  (2.101)  indicates  how  R ^  will  be  found.  R ^  is  formed  by 
first  taking  an  expectation  over  x  given  the  random  Om(i)  coefficients.  Then  an 
expectation  with  respect  to  the  0^,(1)  coefficients  is  taken  giving  /?**.  The 
assumption  that  u.(t)  is  a  gaussian  sequence  in  (2.98)  implies  that  x(£)  is  also 
gaussian  since  it  is  a  linear  combination  of  past  u[t)  values.  This  gives  the 
simplification, 


E[x*{i)z2(i)  |  am(i)an(j)]  =  £’[x8(i)|am(i))£'(x2C7)|(in(j)]  + 

2[fi’[x(i)xO)|an,(i)an(;)])Z  ■ 


:(i)  =  £ 


=  £  £  am(£)u(i-m)an  (1)14(1-71) 


(2.102) 


(2.103) 


(2.104) 


which  results  in 


la^i)]  =  £  £  (2.105) 

n=0  n  =0 

Note  that  ^[^(i-m^^-^n.)]  is  nonzero  only  when  i-m  =i-n.  So  (2  105) 


simplifies  to 


£[**(*)  I  «m(i)]  =  £ 


(2.106) 
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V. 


I: 


Evaluating  the  second  term  in  (2. 102)  gives, 

fl  “m( (*  ~rn)tt  (; -71 )]  (2.107) 

fu  =0  n—  0 

where  E^u(i-m)u(j -n)j  is  nonzero  only  when  i-m  ~j-n.  This  lets  (2.107) 
simplify  to 

JF[x(i)x(;)|a„(i)a„0)]  =  £  “m(i)<Wt(;)^  (2.108) 

m  =0 

where!'  =  i-j  and  l'^k.  Fori>Jk,  E^e  (i  )a:  (J )  It^  (i)a„(  ji)j  =  0. 

Now  evaluate  /?„(i.j)  by  taking  expectations  of  (2.106),  and  (2.108).  This 
will  be  done  in  two  parts.  /?**  is  l  by  l  so  1  =i-j  ranges  from  0  to  l  .  The 
length,  k,  of  the  all  zero  jammer  wavefront  generation  process  is  not  related  to 
the  length  of  the  window,  l .  Hence  k  can  be  greater  or  smaller  than  l .  The  fol¬ 
lowing  development  covers  values  of  for  any  k,l  The  two  cases  to  con¬ 

sider  are  1st  and  i  >  k.  First,  for  I'^k, 


W»30  *1*0  J 

*2  o«(i)om*i0'K(»K+l(i) 

\  m *C  n*0 

Denoting  expectation  by  over-bars  as  in’Eq  (2.80)  of  gives 

E[z(i)z(J)]  =  u*  \  £  £  a£{i)a*(j)  + 

p**0  n=0 

+  2  *2  *2  <  (*  ♦rCr)a»  (i )“„ >') 

m=0  n=0 

Now,  for  l'  >  k , 


(2.109) 


(2.110) 


£[z(i)z(i)]  =  u*  £,  t,  (2-H1) 

m=0  n=0 

To  achieve  a  further  simplification,  assume  that  the  sequences  a^i)  and  ^(j) 
are  uncorrelated  for  m/n  and  that  they  are  zero  mean,  which  makes 
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“m(i)an(;')  =  o  for  m/n  . 


Substituting  this  into  (110)  and  (111)  gives, 


'  UZ 

£  i  a5(*K!0)  +  2  £  «*m(i)a£(i) 

^n=0  n  =0  mMlnxQ 

ft-r  t-r  fc-lfl 

•ur 

E[z(i)z  0')]  =  < 

S  S  ^(*W0)  +  2  s  a^(i)o^+|t.,(y) 

m=0n=0  m=0 

—gz 

u£ 

£  £  (^ )  (j  )] 

Im  =0  n  =0 


(3.112) 


l  =  0 
0  <  |f  |  <k 
(2.113) 


Next  assume  that  the  (i)  are  all  identically  distributed  random  variables, 
which  gives  the  final  simplification.  For  i-j  =  0, 


E[z{i)z{j)]  =  -u? 


3  t  t  t  (^)Z 


m=0 


m=0  n=0 


=  u2®  3(A:+l)(a4  +  ^(a2)2)  . 


(2.114) 


For  |  =  |i'|  s  fc 

E[z(i)z{j)]  =  (u2)2  [£  (az(i)az(j)  -  a?)  +  £  £  a20  +  2  (a2®)] 

tfn  =0  m»0  n=0  m  =  l  J 

=  u*  ((*  +  1)  a z(i)n6(j)  +  *(*+l)a*  +  2(*-|i'|  +  l)a*)  (2.115) 


and  for*  <  |i‘| 


E(z(i)z{J))  =  i?®[£  (aa(i)az(;)  -  a3®)  +  £  £ 

|pn=0  m  *0  m=0 
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=  il5®  [(*+l)a*(t)o#y)  -  {*  +  l)a*  +  (Jb  +  l)«(a®)*)  .  (2.116) 


Since  a(i)  is  a  stationary  sequence,  let 
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<fibb(0  =  a8(i)a *(j )  ,  i  =  i-j  .  (2.11?) 


Combining  2.114,  2.115,  2.116  and  2.117  gives, 


{u?)*  E[z{i)z{j)\  = 


3(fc  +  l)(a4  +  ka?)  f  =  o 

{k+l)<pbbl"  +  (fc2+3fc  — 2(f’)+2)(a5)zl  ^  V  4  k 
(A:  +  l)^»toi,  (f‘)  +  (k2+k)a?  k  <1 


(2.118) 


Next  the  elements  of  P*9,  denoted  as  /^(i),  will  be  found.  The  index  i 
ranges  from  t-l  to  t  and  indicates  where  the  component  belongs  in  the  vec¬ 
tor.  First, 

=  E(z*(i)Pt) 

=  iff[tf(*«(i)^  |q»«))]  .  (3.119) 

where 

Pt  =  £[**<0 1  a*  (01 

=  £  a*(t)u*  .  (2.120) 

m=0 

The  last  equality  is  the  result  of  Eq.  (2.106).  To  evaluate  (2.119),  an  expression 
for  the  conditional  expectation  term  must  be  found.  Using  Eq.  (2. 120)  gives 

E[z*(i)P,  |om(i)3  =  "£,  £  -  (2.121) 

m=0  n=0 

since  the  u{t)  sequence  is  white.  Now  taking  expectation  with  respect  to  0^,(1) 
yields  the  total  expectation, 

E[z*(i)Pt]  =  af  £  £  a£(i)a2(t)  .  (2.122) 

m  =0  n=0 

Now,  as  before,  assume  that  the  0^,(1)  are  uncorrelated,  zero  mean  random  vari¬ 
ables,  which  results  in  the  condition  shown  in  equation  2.112.  Also  assume  that 
the  ^(i)  are  identically  distributed  for  alt  m.  This  leads  to  a  simplification  of 
(2.122)  to 

^((A+l^  +  fcOfc+lji2®)  i  =  *  ,  . 

Uk+l)Vbb(t-i)  +  k{k+l)Z?)  *<*  (2123) 


E[z*(i)Pt)  = 
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Equations  (2.118)  and  (2.123)  represent  component  descriptions  of  and 
Pxj  in  terms  of  the  values  of  a*  a*  and  ^M(i).  These  quantities  are  all  deter¬ 
mined  by  the  dynamics  of  the  at(f )  coefficients.  Since  it  was  assumed  that  the 
Oi(t)  are  independent,  and  identically  distributed,  only  one  model  is  required  to 
find  the  unknown  statistics  of  the  a t(f).  The  random  variable  a(t)  is  used  to 
represent  the  statistics  of  all  the  a<(0  A  good  model  of  how  these  coefficients 
change  is  to  assume  the  a(£ )  is  an  n-th  order  Markov  process,  that  is, 


o(0  =  7,a(f-l)  +■  yna(t-n)  +  q(t)  ,  (2.124) 

where  q(t)  is  a  gaussian  white  noise  with  zero  mean.  Figure  2.17  shows  the  z- 
transform  representation  of  how  a (t)  is  generated.  The  actual  ot(0  sequences 
used  for  simulation  are  generated  as  shown  in  Fig.  2.17,  but  to  make  them 
independent,  k  independent  q  (t)  sequences  are  presented  to  k  separate  ,but 
identical,  filters  (see  Fig.  2.18).  Since  the  gain  k  is  variable,  the  qt(f)  sequences 
can  be  restricted  to  being  unit  variance  witl?  no  loss  of  flexibility.  Since  q{t)  is 
gaussian.  a(f)  is  gaussian  since  the  filter  acts  as  a  linear  operator.  To  find 
the  fact  that  a(t)  is  gaussian  gives 

=  E(ae(i))E(ae(j))  +  2(£’(a(i)a(;)))e 

=  (a2)8  +  .  (2.125) 


The  a(f)  sequence  is  stationary  and  therefore  y>oo(i-j)  represents  the  auto 
covariance  matrix  of  a(t).  Values  of  may  be  found  by  the  spectral  fac¬ 

torization  theorem  [14]  as 


Paa(*)  =  Inverse  Z  Transform 


_ fcf _ 

(l-z-l7(*))(l- *?(*-')) 


(2.126) 


The  point  is  that  ^(ifc)  *s  readily  obtainable,  given  the  y(z)  polynomial. 

Since  a(<)  is  gaussian.  it  is  known  that  a4  =  3a 20  These  results  may  be  sub¬ 
stituted  into  equations  2.118  and  2.123  to  get, 


61 


u**  E(xz(i)x2(j)) 


3(A:+a)(3+fc)a28  0  =  i' 

2(*  +  l)?«(0  *■  (k*  +4*  +  3)0*  l<l  ^k  (2.127) 

2(fc  +  l )<pL(0  +  (kz+2k  + 1)0**  k  <l' 


and 


uS~Z  E(xe(i)Pt) 


(k  +l)(3+*)is8  i=f 

(k+l)VL(t-i)  +  (fc  +  l)2^  *<* 


(2.138) 


Using  the  above  equations,  an  optimal  window  was  found  for  the  process 

o*(0  =  1.980^-1)  +  .9801  oJf-2)  +  gt(t)  (2.129) 


which  is  shown  in  Fig.  2.19.  The  same  setup  as  described  in  section  2.3  is  used 
for  a  performance  comparison  between  an  algorithm  using  an  optimal  window 
and  an  algorithm  using  an  exponential  window.  Figure  2.20  displays  the  result. 
The  antenna  array  employing  an  optimal  windowing  strategy  tracks  the  changes 
in  the  true  parameter  more  closely.  The  value  of  the  decay  constant,  a.  for  the 
exponential  window  was  determined  so  that  both  algorithms  had  the  same 
weight  variance.  Further,  Fig.  2.21  shows  another  comparison  where  the 
exponential  windowing  algorithm  was  allowed  to  have  a  higher  weight  variance 
than  in  Fig.  2.20.  Note  that  even  in  this  case,  the  tracking  properties  of  the 
optimal  windowing  algorithm  are  superior. 


Figure  2.20.  Comparison  of  optimal  and  exponential  window  algorithms 


Illustration  of  Improvement  in  misadjustment  and 
tracking  tradeoff  by  use  of  optimal  window. 


2.9  CONCLUSION  AND  SUGGESTIONS  FOR  FURTHER  STUDY 


The  weighted  recursive  least  squares  algorithm  has  been  derived  in  the  con¬ 
text  of  realizing  a  Zahm  adaptive  beamformer.  This  algorithm  has  been  shown 
to  be  a  practical  implementation  of  the  ideal  LMS/Newton  algorithm.  The  ability 
of  the  algorithm  to  track  a  time-varying  jammer  was  shown  to  be  highly  depen¬ 
dent  on  the  exponential  decay  rate  of  the  weighting  coefficients.  Minimization  of 
mean  squared  error  averaged  over  the  nonstationarity  was  found  to  involve  a 
tradeoff  between  weight  variance  and  weight  tracking  speed.  The  restriction  of 
considering  only  exponential  weighting  of  the  outputs  was  then  removed  and  a 
method  for  finding  a  more  complex,  higher  order  optimal  weighting  was  given  in 
sections  2  5  and  2.6.  Section  2.7  demonstrated  that  the  use  of  an  exponential 
weighting  is  optimal  when  the  parameters  of  the  jammer  are  changing  as  a  first 
order  Markov  process  Secion  2.8  demonstrated  that  the  use  of  an  optimal 
weighting  on  the  error  outputs  results  in  an  improvement  of  the  tradeoff 
between  parameter  variance  and  parameter  tracking.  Combining  these  results, 
if  the  jammer  is  believed  to  have  a  simple  time  varying  nature,  exponential 
weighting  may  be  employed  by  a  recursive  algorithm  with  near  optimal  result.  If 
the  jammer  is  thought  to  have  a  complicated  time  varying  structure,  it  would  be 
worthwhile  to  find  an  optimal  data  weighting  and  use  this  for  implementing  the 
adaptive  algorithm.  Considerable  improvements  in  performance  could  result. 

Further  study  will  involve  defining  a  more  general  estimation  error  cri 
terion  than  the  zeroth-order  lags  proposed  here.  Also,  this  performance  cri¬ 
terion  should  include  some  contribution  from  errors  in  estimation  of  the  cross 
correlation  vector  P^f  (of  section  2.4).  Work  should  also  be  done  in  developing  a 
recursive  algorithm  such  as  the  iterative  inverse  using  the  higher  order  weight¬ 
ing  techniques  Finally  these  results  should  be  extended  to  an  n-element  Zahm 
array  and  other  types  of  adaptive  arrays. 
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3.1  Introduction 

Another  way  to  improve  the  effectiveness  and  lower  the  cost  of  an  adaptive 
beamformer  would  involve  the  use  of  a  new  underlying  beamformer  structure. 
In  most  broadband  beamformers  the  basic  structure  is  that  of  a  multi-channel 
FIR  (zeros  only)  digital  filter.  In  this  section  we  describe  a  beamformer  that  is 
based  on  both  HR  and  FIR  (both  poles  and  zeros)  filtering.  It  will  be  shown  that 
an  HR  beamformer  gives  improved  nulling  performance  over  the  conventional 
FIR  beamformer.  As  the  hardware  becomes  available,  beamformers  based  on  I1R 
filtering  may  prove  to  be  extremely  valuabLe  to  the  beamforming  community. 

3.2  Broadband  Beamforming 

A  narrowband  beamformer  uses  a  complex  gain  at  each  element  to  perform 
the  antenna  weighting.  A  complex  gain  is  realized  by  using  a  90  degree  phase 
shifter  to  split  the  signal  path  into  seperate  in-phase  and  quadrature  phase 
channels.  Each  channel  is  then  fed  through  selectable  attenuators  correspond¬ 
ing  to  the  real  and  imaginary  part  of  the  complex  gain.  Finally,  the  two  channels 
are  recombined  by  summation.  A  beamformer  using  complex  weighting  can  null 
interference  which  is  of  narrowband  nature  only  and  is  therefore  useful  only  in 
systems  where  the  desired  signal  is  narrowband. 

In  a  broadband  communication  system  such  as  spread-spectrum,  it  is 
necessary  to  use  antennas  operating  over  a  much  broader  bandwidth.  Broad¬ 
band  adaptive  antennas  were  first  proposed  in  [2],  These  systems  differ  from 
the  narrowband  beamformer  by  replacing  the  complex  weights  with  tap  delay 
line  filters.  A  tap  delay  line  filter  is  commonly  refered  to  as  a  finite  impulse 
response  or  FIR  filter.  A  beamformer  using  FIR  filters  rather  than  complex 
weights  is  able  to  use  temporal  as  well  as  spatial  information  to  eliminate 
unwanted  signals  and  to  pass  the  signal  of  interest. 
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Another  class  of  digital  filters  are  those  possessing  an  infinite  impulse 
response  and  are  appropriately  called  UR  filters.  An  HR  filter  in  many  cases  will 
require  far  fewer  weights  than  an  FIR  filter  to  achieve  an  equally  effective  fre¬ 
quency  response.  In  figure  3. 1,  the  frequency  response  of  an  FIR  and  an  1IR  digi¬ 
tal  filter  are  contrasted  where  the  desired  chacteristic  is  to  pass  all  frequencies 
except  those  in  a  narrow  region.  For  equal  number  of  weights,  notice  the 
improved  performance  of  the  HR  filter  over  the  FIR  filter.  With  these  results  in 
mind,  we  set  out  to  investigate  the  application  of  HR  filters  to  adaptive  beam¬ 
forming.  When  adapting  the  weights  of  an  HR  filter  to  achieve  a  desired  effect, 
one  encounters  many  problems.  In  the  signal  processing  field,  there  have  been 
many  attempts  over  the  past  ten  years  or  so  to  try  to  develop  a  reliable  algo¬ 
rithm  for  adaptive  HR  filtering  [15,16].  Recently,  we  have  developed  a  novel 
technique  for  implementing  an  HR  adaptive  filter  and  are  investigating  it  in 
applications  to  digital  filter  design,  adaptive  contol,  adaptive  noise  canceling, 
adaptive  line  enhancement  and  to  adaptive  beamforming. 

In  the  next  section,  we  will  briefly  describe  the  Zahm  beamformer  and  show 
its  similarity  to  a  general  purpose  adaptive  filter.  We  will  then  describe  our  new 
technique  of  IIR  adaptive  filtering  and  give  simulation  results  demonstrating  the 
performance  of  an  HR  Zahm  beamformer. 

3.3  Broadband  Zahm  Beamformer 

A  two  element  Zahm  beamformer  is  shown  in  figure  3.2.  The  signal  to  be 
received  is  assumed  to  be  of  low  power  compared  to  interference  (jamming)  sig¬ 
nals.  The  beamformer  acts  like  a  signal  power  inverter  causing  high  powered 
jamming  signals  to  be  attenuated  relative  to  the  low  power  desired  signal,  It 
operates  by  minimizing  the  antenna  output  power  subject  to  a  "soft  constraint". 
The  primary  antenna  element  is  coupled  directly,  and  it  maintains  the  soft 
omni-directionality  constraint  and  keeps  the  beamformer  output  power  from 
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Figure  3.1.  Comparison  of  (a)  IIR  and  (b)  FIR  band  reject  filters 
having  equal  numbers  of  degrees  of  freedom. 
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Figure  3.2b.  General  purpose  adaptive  filter. 
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being  driven  to  zero.  Attached  to  the  auxilary  elements  are  FIR  adaptive  filters. 
These  filters  allow  the  flexibility  of  frequency  filtering  as  well  as  spatial  filtering 
to  remove  unwanted  jamming  signals.  Figure  3b  shows  a  general  purpose  adap¬ 
tive  filter.  The  two-element  Zahm  beamformer  is  simply  an  adaptive  filter  whose 
desired  response  and  primary  input  are  connected  to  the  primary  antenna  ele¬ 
ment  and  the  auxilliary  antenna  element  respectively. 


3.4  DR  Adaptive  Filtering 

Keeping  in  mind  the  simple  connection  between  adaptive  beamforming  and 
adaptive  filtering,  we  now  discuss  a  general  purpose  adaptive  HR  filter.  The 
usual  form  of  FIR  adaptive  filter  uses  LMS  algorithm  to  adjust  its  weights  to 
minimize  the  mean  square  error  between  the  desired  response  and  the  filter 
output.  The  FIR  LMS  adaptive  filter  has  been  used  in  a  wide  range  of  applications 
with  favorable  results.  There  has  been  much  research  concerning  the  develop¬ 
ment  of  a  general  purpose  adaptive  IIR  filter  possessing  the  same  robust  charac¬ 
teristic  as  the  FIR  LMS  algorithm. 

In  this  section  we  will  describe  our  technique  for  implementing  an  IIR  adap¬ 
tive  filter.  Figure  3.3  shows  the  basic  structure  of  an  HR  adaptive  filter.  We  use 
z-transform  notation  to  indicate  a  linear  filter,  the  converged  adaptive  filter. 
The  polynomial  B(z)  corresponds  to  the  zeros  or  FIR  part  of  the  digital  filter  and 

the  polynomial  y-^— corresponds  to  the  poles  or  HR  part  of  the  filter. 

The  fixed  term  1  in  the  denominator  polynomial  alleviates  the  problem  of  divid¬ 
ing  by  zero  and  does  not  influence  the  generality  of  the  toted  structure.  The 
adaptive  part  of  this  filter  adjusts  the  coefficients  of  the  A(z)  and  B(z)  polyno¬ 
mial  in  such  a  manner  that  the  mean  squared  output  error  is  minimized.  This  is 
standardly  refered  to  as  an  "  output  error  "  method  since  it  is  directly  minimiz¬ 
ing  the  difference  between  the  desired  response  d  and  the  filter  output  y. 


The  output  error  method  of  adaptive  HR  filtering  has  many  problems.  The 
mean  square  error  is  a  non-quadratic  function  of  the  feedback  weights.  This 
causes  major  difficulties  that  Lie  in  maintaining  algorithm  stability,  providing 
relatively  rapid  convergence  rates  and  guaranteeing  optimality  of  the  converged 
filter  (i.e.  convergence  to  the  global  rather  than  to  a  local  optimum).  In  the  FIR 
case,  these  problems  are  for  the  most  part,  nonexistent. 

For  example,  a  fixed  FIR  filter  is  always  stable.  Since  the  filter  output  is  a 
linear  sum  of  delayed  versions  of  its  input,  a  bounded  input  must  result  in  a 
bounded  output.  In  a  fixed  1IR  filter  this  is  not  the  case.  Figure  3.4  shows  a  sin¬ 
gle  weight  IIR  filter.  The  output  ye  is  generated  by  adding  a  delayed  version  of 
the  output  to  the  input  xt . 

Vt  =  a-iVt-i  +  *t  •  (3  1) 

If  the  coefficient  ay  is  of  magnitude  greater  than  1.  the  filter  output  will  "blow 
up".  This  is  due  to  the  output  feedback  inherent  in  IIR  filtering  making  the  filter 
unstable.  When  adapting  IIR  filters,  one  must  make  sure  the  filter  weights 
remain  in  the  stable  region. 

Another  difficulty  in  IIR  adaptive  filtering  is  one  of  slow  convergence.  Most 
adaptive  IIR  filters  require  enormous'  amounts  of  data  and  consequently  long 
waiting  times  until  the  filter  converges.  This  is  primarily  due  to  the  nature  of 
the  so  called  error  surface  which  contains  many  flat  areas.  Because  most  adap¬ 
tive  algorithms  are  based  on  the  method  of  steepest  descent,  flat  areas  on  the 
error  surface  result  in  slow  convergence.  In  the  usual  FIR  filtering  problem,  this 
is  not  the  case.  The  error  surface  of  an  FIR  adaptive  filter  is  a  quadratic  func¬ 
tion  of  its  weights  and  has  only  one  minimum  and  usually  does  not  possess  fiat 
spots. 

The  technique  we  have  developed  for  adapting  IIR  filters  involves  minimizing 
an  error  which  is  somewhat  different  from  the  output  error.  The  different 
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choice  of  error  alleviates  many  of  the  problems  just  discussed.  Consider  a  refor¬ 
mulation  of  the  basic  1IR  adaptive  filter  shown  in  figure  3.3.  In  the  path  of  the 
error,  place  an  FIR  and  an  HR  filter  in  cascade  having  transfer  functions 


1+2  xA{z)  and 


-respectively.  This  is  shown  in  figure  3.5a.  These  two 


l+z-,i4(z) 

filters  in  cascade  cancel  and  do  not  modify  the  overall  structure.  By  pushing 
the  first  filter  through  the  summation  node,  the  structure  shown  In  figure  3.5a  is 
transformed  into  the  identical  structure  shown  in  figure  3.5b.  Now  cancellation 
of  the  two  cascaded  filters  results  in  the  structure  shown  in  figure  3.5c. 


Notice  the  difference  between  the  error  et  and  the  error  et‘;  one  is  a  filtered 
version  of  the  other  The  error  et  is  called  output  error  and  the  error  e't  is 
called  "equation  error".  This  terminology  comes  from  control  systems  theory. 
It  turns  out  that  minimization  of  equation  error  is  a  much  simpler  task  than 
minimization  of  output  error.  This  is  because  the  mean  square  of  the  output 
error  is  quadratic  in  the  parameters  of  B(z),  but  non-quadratic  function  of  the 
parameters  of  A(z).  On  the  other  hand,  the  mean  square  of  equation  error  is  a 
quadratic  function  of  the  parameters  of  both  A(z)  and  B(z).  But  is  it  reasonable 
to  minimize  equation  error  rather  than  the  natural  output  error?  If  the  A(z)  and 
B(z)  polynomials  have  enough  weights  to  make  the  equation  error  small,  then  it 
is  likely  that  the  output  error  will  also  be  small.  In  fact,  if  the  equation  error  is 
driven  to  zero,  then  the  output  error  will  also  be  zero.  In  this  case,  minimizing 
equation  error  is  equivalent  to  minimizing  output  error. 


A  filter  based  on  minimizing  equation  error  is  shown  in  figure  3.6.  The 
dashed  lines  indicate  that  the  weights  are  to  be  copied  into  the  output  filter 

- - .  - ■  ■.  The  interesting  feature  about  this  structure  is  that  it  can  be 

1+z  M(z) 

adapted  using  a  simple  FIR  LMS  algorithm.  Consequently,  it  will  have  many  of 
the  nice  robust  characteristics  that  the  FIR  LMS  algorithm  has.  The  error  sur- 
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face  will  be  a  quadratic  function  of  the  weights;  there  will  be  no  flat  spots  to  slow 
convergence  and  no  local  minima  to  hinder  global  optimality. 


One  important  detail  has  been  neglected  here.  How  can  we  be  sure  that  the 

output  Alter  - —  —  will  be  stable?  The  roots  of  the  adaptive  filter  which  is 

A  [z  j 

filtering  the  desired  response  could  lie  outside  the  stability  region.  As  such,  this 

would  mean  that  - 7 — ——would  be  unstable.  To  overcome  this  problem,  we 

l+z-*i4(z)  K 

have  developed  a  minimum-phase  constrained  LMS  algortihm.  Minimum  phase 

in  this  case  means  that  all  the  roots  of  l+z~lA(z)  lie  within  the  unit  circle 


(which  is  the  desired  stability  region  for 


1 

l+z~lA(z ) 


) 


If  each  update  of  the 


1+z  J/4(z)  polynomial  is  done  so  that  the  updated  polynomial  is  kept  minimum 
phase,  then  the  output  Alter  will  remain  stable. 

The  key  to  the  new  adaptive  HR  filter  is,  in  fact,  the  development  of  a 
minimum  phase  constrained  LMS  algorithm.  This  algorithm  can  be  described  as 
follows. 


The  minimum  phase  constrained  LMS  algorithm  is  a  method  of  adapting 
polynomial  coefficients  to  minimize  the  mean  squar  e  error  subject  to  the  con¬ 
straint  that  all  roots  of  the  polynomial  lie  either  within  the  unit  circle  or  within 
any  circle  of  prescribed  radius.  As  previously  mentioned,  the  objective  of  this 
algorithm  is  to  generate  an  invertible  polynomial  to  be  used  in  an  adaptive  HR 
filter  structure. 


An  update  of  the  constrained  polynomial  is  accomplished  in  two  steps.  The 
first  step  performs  a  conventional  LMS  update.  The  second  step  alters  the 
updated  polynomial  in  such  a  way  that  the  resulting  polynomial  is  minimum 
phase. 


t 


* 


V 


79 

Step  1.  Conventional  LMS  update: 

1  =  Aj  —  iiSjXj  (3-2) 

where  Aj  is  a  vector  containing  the  coefficients  of  the  A(z)  polynomial  and  e:,X3 
are  defined  as  in  figure  3.6, 

Step  2.  Minimum  phase  projection: 

First  check  the  l+z~M(z)  polynomial  for  minimum  phase.  If  already  minimum 
phase,  this  step  is  terminated.  If  A(z)  is  not  minimum  phase,  then  all  roots  of 
A(z)  are  shrunk  radially  towards  the  origin  by  a  "shrinkage  factor”,  p 

Aj+X{z)  <-  Aj+x{p-lz)  (3.3) 

If  p  is  chosen  to  be  a  positive  number  less  than  1,  then  all  of  the  roots  of  the 
Af+1(z)  polynomial  will  be  drawn  radially  inward.  The  shrinkage  ratio,  p,  is 
reduced  slowly  in  a  series  of  small  steps  until  A(z)  passes  the  minimum  phase 
test.  This  completes  the  adaptation  cycle  When  the  next  data  vector  is  avail¬ 
able,  a  new  adapt  cycle  commences  in  the  Step  1  mode. 

Because  the  minimum  phase  test  may  need  to  be  applied  several  times  per 
adapt  cycle,  it  is  imperative  that  the  test  be  done  in  a  way  that  requires  very  lit¬ 
tle  computation.  An  efficient  test  for  minimum  phase  is  one  based  on  the 
lattice-form  realization  of  FIR  filters  due  to  Itakura  and  Saito  [17].  The  polyno¬ 
mial  1  +z~xA(z)  corresponds  directly  to  a  FIR  transversal  filter.  An  equivalent 
FIR  lattice-form  filter  can  be  constructed  from  knowledge  of  the  transversal 
filter  weights.  Both  FIR  filters  will  have  the  same  impulse  response.  The  polyno¬ 
mial  l+z~'<4(z)  is  minimum  phase  if  and  only  if  all  of  the  weights  (also  known  as 
reflection  coefficients)  of  the  equivalent  lattice  have  magnitude  less  than  one. 
The  beauty  of  this  test  lies  in  the  ease  with  which  the  transversal  filter  is 
transformed  into  an  equivalent  lattice  filter.  An  algorithm  for  converting  poly¬ 
nomial  coefficients  into  the  lattice  filter  reflection  coefficients  is  described 
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below.  This  algorithm  lends  itself  readily  to  computer  implementation,  requir¬ 
ing  memory  space  and  computations  proportional  to  the  square  of  the  filter 
order.  A  few  years  ago,  we  would  not  consider  such  an  approach.  Today,  pro¬ 
cedures  of  this  type  are  with  the  realm  of  practicality. 

The  transversal  filter  polynomial  to  be  tested  can  be  represented  by 

1  +  z~'A( z)  =  1  +  +  •  •  •  +  .  (3.4) 

Define  an  nxm  matrix  Q  and  let  Q(i,j)  indicate  the  ith  row  and  the  jth  column  of 

this  matrix.  Initialize  the  bottom  row  of  the  Q  matrix  to  the  filter  polynomial 
coefficients 

For  j- 1.2 . m 

let  Q(m,j )  (3.5) 

Now  implement  the  following  recursion 

For  i  -  m.  m- 1,  •  ■  •  ,  1 
For  j  =  1,  2,  ■  •  •  ,  i-1 

Ut  «<i-u) .  ?(«;) .  (3.6) 

Let  fcj  be  the  ith  reflection  coefficient.  These  coefficients  can  now  be  read  off 
the  ith  diagonal  of  the  Q  matrix 

For  j=l.  2 . m 

Let  (3.7) 

The  filter  is  minimum  phase  if  and  only  if  all  of  the  reflection  coefficients  are  of 
magnitude  less  than  one. 

1 +z~,/l(z)  minimum, phase  <??>  fcj<  1  fori  =  1,2,  ....  m  .  (3.0) 

This  concludes  the  procedure  for  testing  a  polynomial  for  minimum  phase  and 
completes  the  overall  description  of  the  mjnimum  phase  constrained  LMS  algo¬ 
rithm. 
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The  minimum  phase  LMS  algorithm  is  currently  being  studied  with  the  goal 
of  developing  convergence  and  optimality  proofs.  Although  the  algorithm  is 
unproven  mathematically,  it  has  been  tested  extensively  by  computer  simula¬ 
tion  and  has  never  failed  to  converge  and  find  an  optimal  solution.  It  is  a  very 
important  new  development  which  will  have  applications  in  adaptive  control, 
digital  filter  design,  adaptive  noise  canceling,  and  adaptive  antennas. 

3.5  HR  Zahm  Bearn  former,  a  description 

In  this  section,  we  show  how  to  use  the  UR  adaptive  filter  to  realize  an  HR 
Zahm  beamformer.  Figure  3.7  shows  a  3-element  HR  Zahm  beamformer 
configured  to  minimize  output  error.  As  we  previously  discussed,  minimization 
of  this  output  error  is  a  difficult  task.  Instead  we  choose  to  reconfigure  the  sys¬ 
tem  by  a  series  of  evolutionary  steps  shown  in  figure  3.Ba-e,  to  minimize  equa¬ 
tion  error 

First,  we  seperate  the  filters  into  all-zero,and  all-pole  sections  as  shown  in 
figure  3.8b.  Next  we  establish  a  common  denominator  by  inserting  a  filter 

j- - 1 1  r.1;- - -i  .  directly  after  the  summer,  and  compensate  by  plac- 

(1+z  Al(z)){  1+z  A2\z)) 

ing  the  inverse  of  this  filter  at  the  input  to  the  summer.  The  introduction  of 
these  two  filters  has  not  changed  the  overall  system,  since  they  cancel  each 
other's  effects.  Note  that  if  the  filter  1  +z-Mi(z)  is  adapted  in  one  place,  all 
other  copies  Aj(z)  must  receive  the  same  update,  which  is  indicated  by  the 
dashed  lines.  The  new  configuration  of  figure  3.8d  behaves  the  same  as  that  of 
figure  3.8a,  since  we  have  simply  rearranged  the  system  of  figure  3,8a.  A  further 
simplification  is  to  treat  the  cascaded  filters  B,(z)(l  +  z~lAs(z))  and 
Bz(z)(l+z~xAl(z))  as  single  filters  denoted  by  B\  and  B‘z  respectively.  The  sys¬ 
tem  of  figure  3.8d  will  be  adapted  using  the  equation  error  e'  rather  than  than 
the  true  output  error  e .  This  eliminates  the  HR  part  of  the  beamformer  from 
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the  adaptation,  loop,  so  a  simpLe  algorithm  for  updating  FIR  filters,  such  as  LMS, 
can  be  used.  For  simplicity  we  define  the  product  of  the  two  FIR  filters  as  one 
single  filter, 

(l+z-Mi(z))(H-z-,i42(2))  =  14-z"lA(z).  (3.9) 

The  above  result  is  easily  generalized  to  a  larger  array  with  a  greater  number  of 
auxilliary  antenna  elements  and  is  shown  in  figure  3.9. 

t 

1 


The  weights,  which  are  copied  into  the  output  filter 


-,  must  not 


1+z  lA(z)‘ 

cause  this  filter  to  become  unstable.  To  insure  stability,  we  update  the  A(z) 
polynomial  using  the  minimum  phase  LMS  algorithm  discussed  in  the  previous 
section.  The  other  polynomials  are  adapted  with  the  conventional  LMS  algo¬ 
rithm. 

In  the  next  section,  we  will  show  simulation  results  using  the  1IR  Zahm 
beamformer  of  figure  3.7.  The  Z?t(z)  filters  are  adapted  by  LMS  and  the  A(z)  is 
adapted  using  minimum  phase  LMS. 

3.6  HR  Zahm  Beamformer,  simulation  results 

In  the  following  experiment,  we  compare  the  performance  of  a  conventional 
Zahm  beamformer  to  an  HR  Zahm  beamformer.  To  make  a  valid  comparison, 
the  number  of  degrees  of  freedom  in  both  beamformers  must  be  equal. 
Specifically,  we  will  use  a  4-element  broadband  Zahm  beamformer  with  linear 
element  placement  and  4  taps  per  element.  This  corresponds  to  12  degrees  of 
freedom  The  HR  Zahm  beamformer  will  have  3  elements  with  the  same  linear 
placement  and  four  taps  per  element  corresponding  to  11  degrees  of  freedom. 
The  missing  degree  of  freedom  arises  because  of  the  fixed  1  in  the  1 +z"'A(z) 
polynomial.  Incident  upon  this  array  is  a  broadband  jammer  having  a  bandwidth 
equal  to  10%  of  its  center  frequency.  The  desired  signal  is  assumed  to  be  of  low 
power  and,  consequently  has  little  effect  on  the  converged  beam  pattern.  The 
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Figure  3.9.  Simplified  IIR  version  of  Zahm  adaptive  beamformer 
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ideal  behavior  of  this  form  of  adaptive  antenna  would  be  to  completely  eliminate 
the  jamming  signal  from  the  system  output  by  forming  a  broadband  null  in  the 
direction  of  the  jammer.  Unfortunately,  perfect  broadband  nulls  could  only  be 
obtained  by  implementing  exact  delays  on  all  the  antenna  element  circuits. 
With  such  delays  the  signals  can  be  recombined  in  such  a  manner  that  perfect 
cancellation  can  occur.  Physically,  this  would  require  an  infinite  range  of  adju¬ 
stable  delay.  If  two  jammers  were  present,  it  would  be  theoretically  impossible 
to  form  two  perfect  broadband  nulls.  Good  broadband  nulls  can  be  obtained  in 
practice  using  finely  spaced  delay  line  taps.  As  such,  one  can  remove  most  of 
the  broadband  jamming  power.  Figure  3. 10  shows  the  frequency  reponse  of  the 
converged  beamformer  in  the  direction  of  the  jammer  for  both  the  conventional 
Zahm  array  and  the  HR  Zahm  array.  Notice  the  improved  notching  performance 
of  the  HR  beamformer.  It  is  able  to  remove  a  great  deal  more  of  the  jammer 
power  than  the  conventional  beamformer.  Although  additional  simulations  need 
to  be  inn.  we  feel  that  this  simple  example  demonstrates  the  possible  improve¬ 
ments  which  could  be  achieved  using  IIR  beamforming. 

3.7  Further  Work 

What  we  have  seen  thus  far  is  probably  just  the  "tip  of  an  iceberg".  Work  is 
now  proceeding  to  extend  these  results  to  an  IIR  version  of  a  Frost  beamformer. 
The  Frost  beamformer  does  not  require  the  assumtion  of  a  iow-power  desired 
signal,  but  does  assume  knowledge  of  the  signal  look-direction.  Additional  simu¬ 
lations  and  mathematical  analysis  must  be  done  to  verify  the  expected  perfor¬ 
mance  improvements  under  a  wider  range  of  signal,  noise,  and  jammer  condi¬ 
tions.  Analytical  expressions  for  optimal  beam  patterns  are  being  derived. 
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(b)  I IR  beamformer 


Figure  3.10.  Comparison  of  frequency  response  in  the  direction  of  a  broadband 
jammer  for  (a)  conventional  Zahm  beamformer  and  (b)  1 1 R  Zahm 
beamformer.  Notice  the  notch  improvement  obtained  using  the 
1 IR  beamformer. 
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APPENDIX  A 


The  matrix  inversion  lemma  is  given  by 


choosing 


(A+BCD)-'  =  A~l  - A~lB[C~l+DA~xB]~lDA~l 


A  =  aR{t)  =  ±-P(t) 


B  -  DT  -  X{t  +1) 


C  =  1  . 


(«*(*)+ *(*+l)*r(f  +  !))■'  =  —  P(t)  -  ~P(t)X(t  +  l) 


[1  +  —  XT(t+l)P(t)X(t+l)]~,XT(t+l)P(t)  — 


=  — [p(o  -  wm+m (A. 5) 


APPENDIX  B 


From  the  text,  equation  2.20,  the  weight  update  is  given  as: 

+  =  P(t  +  1)(  £  aen(0«i(i)2:(i)  +  +l)Jf(t  +  l))  .  (B.l) 

<+i 

Substituting  Eq.  (la)  gives 

M(t+n  =  l-(p(t)  -  PiLm±X£liLt£££L 

(£  aai(t)d(i)X{i)  +  d(t  +l)X(t+l)) 

d(fM)P(0*(* +l)JCr(*  +  l)P(W/+-l)  -  P(t)X(t  +  l)Xr(t  +  l)P{t)d{t  +  l)X(t+l) 
- aP(t)X(t+l)XT{t+a)P(t )  £  at(Od(i>X(i)) 

<=i 

*  ™  *  (a  ,I>(ltypW(»l))  +  IB  2) 

which  gives  the  following  recursive  update  for  J%(() 

*  ■K(0  +  .a'd>WW».|  f|‘Wl<lW|l,|) 

)*(*  +  !))  .  (B.3) 
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